[INFO|2025-04-10 02:12:51] tokenization_utils_base.py:2050 >> loading file vocab.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B/snapshots/d149729398750b98c0af14eb82c78cfe92750796/vocab.json

[INFO|2025-04-10 02:12:51] tokenization_utils_base.py:2050 >> loading file merges.txt from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B/snapshots/d149729398750b98c0af14eb82c78cfe92750796/merges.txt

[INFO|2025-04-10 02:12:51] tokenization_utils_base.py:2050 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B/snapshots/d149729398750b98c0af14eb82c78cfe92750796/tokenizer.json

[INFO|2025-04-10 02:12:51] tokenization_utils_base.py:2050 >> loading file added_tokens.json from cache at None

[INFO|2025-04-10 02:12:51] tokenization_utils_base.py:2050 >> loading file special_tokens_map.json from cache at None

[INFO|2025-04-10 02:12:51] tokenization_utils_base.py:2050 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B/snapshots/d149729398750b98c0af14eb82c78cfe92750796/tokenizer_config.json

[INFO|2025-04-10 02:12:51] tokenization_utils_base.py:2050 >> loading file chat_template.jinja from cache at None

[INFO|2025-04-10 02:12:52] tokenization_utils_base.py:2313 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

[INFO|2025-04-10 02:12:52] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B/snapshots/d149729398750b98c0af14eb82c78cfe92750796/config.json

[INFO|2025-04-10 02:12:52] configuration_utils.py:771 >> Model config Qwen2Config {
  "_name_or_path": "Qwen/Qwen2.5-7B",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 131072,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.49.0",
  "use_cache": true,
  "use_mrope": false,
  "use_sliding_window": false,
  "vocab_size": 152064
}


[INFO|2025-04-10 02:12:52] tokenization_utils_base.py:2050 >> loading file vocab.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B/snapshots/d149729398750b98c0af14eb82c78cfe92750796/vocab.json

[INFO|2025-04-10 02:12:52] tokenization_utils_base.py:2050 >> loading file merges.txt from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B/snapshots/d149729398750b98c0af14eb82c78cfe92750796/merges.txt

[INFO|2025-04-10 02:12:52] tokenization_utils_base.py:2050 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B/snapshots/d149729398750b98c0af14eb82c78cfe92750796/tokenizer.json

[INFO|2025-04-10 02:12:52] tokenization_utils_base.py:2050 >> loading file added_tokens.json from cache at None

[INFO|2025-04-10 02:12:52] tokenization_utils_base.py:2050 >> loading file special_tokens_map.json from cache at None

[INFO|2025-04-10 02:12:52] tokenization_utils_base.py:2050 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B/snapshots/d149729398750b98c0af14eb82c78cfe92750796/tokenizer_config.json

[INFO|2025-04-10 02:12:52] tokenization_utils_base.py:2050 >> loading file chat_template.jinja from cache at None

[INFO|2025-04-10 02:12:53] tokenization_utils_base.py:2313 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

[INFO|2025-04-10 02:12:53] logging.py:143 >> Loading dataset catsith_output.json...

[INFO|2025-04-10 02:12:58] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B/snapshots/d149729398750b98c0af14eb82c78cfe92750796/config.json

[INFO|2025-04-10 02:12:58] configuration_utils.py:771 >> Model config Qwen2Config {
  "_name_or_path": "Qwen/Qwen2.5-7B",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 131072,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.49.0",
  "use_cache": true,
  "use_mrope": false,
  "use_sliding_window": false,
  "vocab_size": 152064
}


[INFO|2025-04-10 02:12:58] logging.py:143 >> KV cache is disabled during training.

[INFO|2025-04-10 02:12:58] modeling_utils.py:3982 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B/snapshots/d149729398750b98c0af14eb82c78cfe92750796/model.safetensors.index.json

[INFO|2025-04-10 02:12:58] modeling_utils.py:1633 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16.

[INFO|2025-04-10 02:12:58] configuration_utils.py:1140 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "use_cache": false
}


[WARNING|2025-04-10 02:12:58] logging.py:329 >> Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.

[INFO|2025-04-10 02:13:05] modeling_utils.py:4970 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.


[INFO|2025-04-10 02:13:05] modeling_utils.py:4978 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen2.5-7B.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.

[INFO|2025-04-10 02:13:05] configuration_utils.py:1095 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B/snapshots/d149729398750b98c0af14eb82c78cfe92750796/generation_config.json

[INFO|2025-04-10 02:13:05] configuration_utils.py:1140 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "max_new_tokens": 2048
}


[INFO|2025-04-10 02:13:05] logging.py:143 >> Gradient checkpointing enabled.

[INFO|2025-04-10 02:13:05] logging.py:143 >> Using torch SDPA for faster training and inference.

[INFO|2025-04-10 02:13:05] logging.py:143 >> Upcasting trainable params to float32.

[INFO|2025-04-10 02:13:05] logging.py:143 >> Fine-tuning method: LoRA

[INFO|2025-04-10 02:13:05] logging.py:143 >> Found linear modules: v_proj,k_proj,gate_proj,down_proj,up_proj,q_proj,o_proj

[INFO|2025-04-10 02:13:06] logging.py:143 >> trainable params: 20,185,088 || all params: 7,635,801,600 || trainable%: 0.2643

[INFO|2025-04-10 02:13:06] trainer.py:746 >> Using auto half precision backend

[WARNING|2025-04-10 02:13:06] trainer.py:781 >> No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.

[INFO|2025-04-10 02:13:06] trainer.py:2405 >> ***** Running training *****

[INFO|2025-04-10 02:13:06] trainer.py:2406 >>   Num examples = 2,009

[INFO|2025-04-10 02:13:06] trainer.py:2407 >>   Num Epochs = 3

[INFO|2025-04-10 02:13:06] trainer.py:2408 >>   Instantaneous batch size per device = 2

[INFO|2025-04-10 02:13:06] trainer.py:2411 >>   Total train batch size (w. parallel, distributed & accumulation) = 16

[INFO|2025-04-10 02:13:06] trainer.py:2412 >>   Gradient Accumulation steps = 8

[INFO|2025-04-10 02:13:06] trainer.py:2413 >>   Total optimization steps = 375

[INFO|2025-04-10 02:13:06] trainer.py:2414 >>   Number of trainable parameters = 20,185,088

[INFO|2025-04-10 02:13:25] logging.py:143 >> {'loss': 4.5494, 'learning_rate': 4.9978e-05, 'epoch': 0.04, 'throughput': 416.59}

[INFO|2025-04-10 02:13:43] logging.py:143 >> {'loss': 4.5269, 'learning_rate': 4.9912e-05, 'epoch': 0.08, 'throughput': 375.28}

[INFO|2025-04-10 02:14:01] logging.py:143 >> {'loss': 4.0421, 'learning_rate': 4.9803e-05, 'epoch': 0.12, 'throughput': 376.49}

[INFO|2025-04-10 02:14:20] logging.py:143 >> {'loss': 3.8231, 'learning_rate': 4.9650e-05, 'epoch': 0.16, 'throughput': 390.13}

[INFO|2025-04-10 02:14:38] logging.py:143 >> {'loss': 3.5978, 'learning_rate': 4.9454e-05, 'epoch': 0.20, 'throughput': 372.36}

[INFO|2025-04-10 02:14:56] logging.py:143 >> {'loss': 3.5563, 'learning_rate': 4.9215e-05, 'epoch': 0.24, 'throughput': 358.30}

[INFO|2025-04-10 02:15:14] logging.py:143 >> {'loss': 3.7193, 'learning_rate': 4.8933e-05, 'epoch': 0.28, 'throughput': 353.55}

[INFO|2025-04-10 02:15:32] logging.py:143 >> {'loss': 3.4887, 'learning_rate': 4.8609e-05, 'epoch': 0.32, 'throughput': 350.40}

[INFO|2025-04-10 02:15:50] logging.py:143 >> {'loss': 3.4482, 'learning_rate': 4.8244e-05, 'epoch': 0.36, 'throughput': 346.39}

[INFO|2025-04-10 02:16:08] logging.py:143 >> {'loss': 3.5251, 'learning_rate': 4.7839e-05, 'epoch': 0.40, 'throughput': 339.62}

[INFO|2025-04-10 02:16:26] logging.py:143 >> {'loss': 3.4091, 'learning_rate': 4.7393e-05, 'epoch': 0.44, 'throughput': 344.36}

[INFO|2025-04-10 02:16:45] logging.py:143 >> {'loss': 3.4556, 'learning_rate': 4.6908e-05, 'epoch': 0.48, 'throughput': 351.95}

[INFO|2025-04-10 02:17:03] logging.py:143 >> {'loss': 3.3753, 'learning_rate': 4.6384e-05, 'epoch': 0.52, 'throughput': 355.56}

[INFO|2025-04-10 02:17:22] logging.py:143 >> {'loss': 3.3786, 'learning_rate': 4.5823e-05, 'epoch': 0.56, 'throughput': 353.28}

[INFO|2025-04-10 02:17:40] logging.py:143 >> {'loss': 3.4089, 'learning_rate': 4.5225e-05, 'epoch': 0.60, 'throughput': 349.53}

[INFO|2025-04-10 02:17:58] logging.py:143 >> {'loss': 3.4508, 'learning_rate': 4.4592e-05, 'epoch': 0.64, 'throughput': 352.33}

[INFO|2025-04-10 02:18:16] logging.py:143 >> {'loss': 3.3330, 'learning_rate': 4.3925e-05, 'epoch': 0.68, 'throughput': 353.37}

[INFO|2025-04-10 02:18:35] logging.py:143 >> {'loss': 3.3651, 'learning_rate': 4.3224e-05, 'epoch': 0.72, 'throughput': 357.01}

[INFO|2025-04-10 02:18:52] logging.py:143 >> {'loss': 3.2850, 'learning_rate': 4.2492e-05, 'epoch': 0.76, 'throughput': 355.97}

[INFO|2025-04-10 02:19:10] logging.py:143 >> {'loss': 3.3726, 'learning_rate': 4.1728e-05, 'epoch': 0.80, 'throughput': 356.11}

[INFO|2025-04-10 02:19:10] trainer.py:3942 >> Saving model checkpoint to saves/Qwen2.5-7B/lora/train_2025-04-10-02-02-13/checkpoint-100

[INFO|2025-04-10 02:19:11] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B/snapshots/d149729398750b98c0af14eb82c78cfe92750796/config.json

[INFO|2025-04-10 02:19:11] configuration_utils.py:771 >> Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 131072,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.49.0",
  "use_cache": true,
  "use_mrope": false,
  "use_sliding_window": false,
  "vocab_size": 152064
}


[INFO|2025-04-10 02:19:11] tokenization_utils_base.py:2500 >> tokenizer config file saved in saves/Qwen2.5-7B/lora/train_2025-04-10-02-02-13/checkpoint-100/tokenizer_config.json

[INFO|2025-04-10 02:19:11] tokenization_utils_base.py:2509 >> Special tokens file saved in saves/Qwen2.5-7B/lora/train_2025-04-10-02-02-13/checkpoint-100/special_tokens_map.json

[INFO|2025-04-10 02:19:30] logging.py:143 >> {'loss': 3.3490, 'learning_rate': 4.0936e-05, 'epoch': 0.84, 'throughput': 360.08}

[INFO|2025-04-10 02:19:48] logging.py:143 >> {'loss': 3.3335, 'learning_rate': 4.0115e-05, 'epoch': 0.88, 'throughput': 358.69}

[INFO|2025-04-10 02:20:06] logging.py:143 >> {'loss': 3.3096, 'learning_rate': 3.9268e-05, 'epoch': 0.92, 'throughput': 356.95}

[INFO|2025-04-10 02:20:24] logging.py:143 >> {'loss': 3.2696, 'learning_rate': 3.8396e-05, 'epoch': 0.96, 'throughput': 354.39}

[INFO|2025-04-10 02:20:43] logging.py:143 >> {'loss': 3.3071, 'learning_rate': 3.7500e-05, 'epoch': 1.00, 'throughput': 357.59}

[INFO|2025-04-10 02:21:01] logging.py:143 >> {'loss': 3.8956, 'learning_rate': 3.6582e-05, 'epoch': 1.04, 'throughput': 355.28}

[INFO|2025-04-10 02:21:19] logging.py:143 >> {'loss': 3.2505, 'learning_rate': 3.5644e-05, 'epoch': 1.08, 'throughput': 353.84}

[INFO|2025-04-10 02:21:38] logging.py:143 >> {'loss': 3.2490, 'learning_rate': 3.4688e-05, 'epoch': 1.12, 'throughput': 355.97}

[INFO|2025-04-10 02:21:56] logging.py:143 >> {'loss': 3.1777, 'learning_rate': 3.3714e-05, 'epoch': 1.16, 'throughput': 354.36}

[INFO|2025-04-10 02:22:14] logging.py:143 >> {'loss': 3.2005, 'learning_rate': 3.2725e-05, 'epoch': 1.20, 'throughput': 354.54}

[INFO|2025-04-10 02:22:32] logging.py:143 >> {'loss': 3.1882, 'learning_rate': 3.1723e-05, 'epoch': 1.24, 'throughput': 356.67}

[INFO|2025-04-10 02:22:50] logging.py:143 >> {'loss': 3.2861, 'learning_rate': 3.0709e-05, 'epoch': 1.28, 'throughput': 354.95}

[INFO|2025-04-10 02:23:08] logging.py:143 >> {'loss': 3.2012, 'learning_rate': 2.9685e-05, 'epoch': 1.32, 'throughput': 352.93}

[INFO|2025-04-10 02:23:27] logging.py:143 >> {'loss': 3.1424, 'learning_rate': 2.8652e-05, 'epoch': 1.36, 'throughput': 353.98}

[INFO|2025-04-10 02:23:45] logging.py:143 >> {'loss': 3.1061, 'learning_rate': 2.7613e-05, 'epoch': 1.40, 'throughput': 351.42}

[INFO|2025-04-10 02:24:04] logging.py:143 >> {'loss': 3.1739, 'learning_rate': 2.6570e-05, 'epoch': 1.44, 'throughput': 352.41}

[INFO|2025-04-10 02:24:22] logging.py:143 >> {'loss': 3.1692, 'learning_rate': 2.5524e-05, 'epoch': 1.48, 'throughput': 353.70}

[INFO|2025-04-10 02:24:41] logging.py:143 >> {'loss': 3.2116, 'learning_rate': 2.4476e-05, 'epoch': 1.52, 'throughput': 355.00}

[INFO|2025-04-10 02:24:58] logging.py:143 >> {'loss': 3.3134, 'learning_rate': 2.3430e-05, 'epoch': 1.56, 'throughput': 355.61}

[INFO|2025-04-10 02:25:17] logging.py:143 >> {'loss': 3.2215, 'learning_rate': 2.2387e-05, 'epoch': 1.60, 'throughput': 356.20}

[INFO|2025-04-10 02:25:17] trainer.py:3942 >> Saving model checkpoint to saves/Qwen2.5-7B/lora/train_2025-04-10-02-02-13/checkpoint-200

[INFO|2025-04-10 02:25:17] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B/snapshots/d149729398750b98c0af14eb82c78cfe92750796/config.json

[INFO|2025-04-10 02:25:17] configuration_utils.py:771 >> Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 131072,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.49.0",
  "use_cache": true,
  "use_mrope": false,
  "use_sliding_window": false,
  "vocab_size": 152064
}


[INFO|2025-04-10 02:25:17] tokenization_utils_base.py:2500 >> tokenizer config file saved in saves/Qwen2.5-7B/lora/train_2025-04-10-02-02-13/checkpoint-200/tokenizer_config.json

[INFO|2025-04-10 02:25:17] tokenization_utils_base.py:2509 >> Special tokens file saved in saves/Qwen2.5-7B/lora/train_2025-04-10-02-02-13/checkpoint-200/special_tokens_map.json

[INFO|2025-04-10 02:25:36] logging.py:143 >> {'loss': 3.0594, 'learning_rate': 2.1348e-05, 'epoch': 1.64, 'throughput': 354.14}

[INFO|2025-04-10 02:25:55] logging.py:143 >> {'loss': 3.1452, 'learning_rate': 2.0315e-05, 'epoch': 1.68, 'throughput': 356.32}

[INFO|2025-04-10 02:26:14] logging.py:143 >> {'loss': 3.2481, 'learning_rate': 1.9291e-05, 'epoch': 1.72, 'throughput': 356.70}

[INFO|2025-04-10 02:26:34] logging.py:143 >> {'loss': 3.1269, 'learning_rate': 1.8277e-05, 'epoch': 1.76, 'throughput': 359.85}

[INFO|2025-04-10 02:26:52] logging.py:143 >> {'loss': 3.1888, 'learning_rate': 1.7275e-05, 'epoch': 1.80, 'throughput': 360.92}

[INFO|2025-04-10 02:27:10] logging.py:143 >> {'loss': 3.2137, 'learning_rate': 1.6286e-05, 'epoch': 1.84, 'throughput': 359.60}

[INFO|2025-04-10 02:27:28] logging.py:143 >> {'loss': 3.1926, 'learning_rate': 1.5312e-05, 'epoch': 1.88, 'throughput': 358.00}

[INFO|2025-04-10 02:27:46] logging.py:143 >> {'loss': 3.2949, 'learning_rate': 1.4356e-05, 'epoch': 1.92, 'throughput': 357.75}

[INFO|2025-04-10 02:28:04] logging.py:143 >> {'loss': 3.1559, 'learning_rate': 1.3418e-05, 'epoch': 1.96, 'throughput': 356.64}

[INFO|2025-04-10 02:28:22] logging.py:143 >> {'loss': 3.1411, 'learning_rate': 1.2500e-05, 'epoch': 2.00, 'throughput': 355.99}

[INFO|2025-04-10 02:28:40] logging.py:143 >> {'loss': 3.5418, 'learning_rate': 1.1604e-05, 'epoch': 2.04, 'throughput': 354.77}

[INFO|2025-04-10 02:28:58] logging.py:143 >> {'loss': 3.0061, 'learning_rate': 1.0732e-05, 'epoch': 2.08, 'throughput': 354.05}

[INFO|2025-04-10 02:29:16] logging.py:143 >> {'loss': 3.1020, 'learning_rate': 9.8850e-06, 'epoch': 2.12, 'throughput': 352.60}

[INFO|2025-04-10 02:29:34] logging.py:143 >> {'loss': 2.9862, 'learning_rate': 9.0644e-06, 'epoch': 2.16, 'throughput': 351.35}

[INFO|2025-04-10 02:29:53] logging.py:143 >> {'loss': 3.0803, 'learning_rate': 8.2717e-06, 'epoch': 2.20, 'throughput': 352.72}

[INFO|2025-04-10 02:30:11] logging.py:143 >> {'loss': 3.1944, 'learning_rate': 7.5084e-06, 'epoch': 2.24, 'throughput': 352.51}

[INFO|2025-04-10 02:30:29] logging.py:143 >> {'loss': 3.1717, 'learning_rate': 6.7758e-06, 'epoch': 2.28, 'throughput': 354.26}

[INFO|2025-04-10 02:30:49] logging.py:143 >> {'loss': 3.2106, 'learning_rate': 6.0751e-06, 'epoch': 2.32, 'throughput': 355.43}

[INFO|2025-04-10 02:31:07] logging.py:143 >> {'loss': 3.0577, 'learning_rate': 5.4077e-06, 'epoch': 2.36, 'throughput': 356.33}

[INFO|2025-04-10 02:31:25] logging.py:143 >> {'loss': 3.1227, 'learning_rate': 4.7746e-06, 'epoch': 2.40, 'throughput': 356.28}

[INFO|2025-04-10 02:31:25] trainer.py:3942 >> Saving model checkpoint to saves/Qwen2.5-7B/lora/train_2025-04-10-02-02-13/checkpoint-300

[INFO|2025-04-10 02:31:25] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B/snapshots/d149729398750b98c0af14eb82c78cfe92750796/config.json

[INFO|2025-04-10 02:31:25] configuration_utils.py:771 >> Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 131072,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.49.0",
  "use_cache": true,
  "use_mrope": false,
  "use_sliding_window": false,
  "vocab_size": 152064
}


[INFO|2025-04-10 02:31:25] tokenization_utils_base.py:2500 >> tokenizer config file saved in saves/Qwen2.5-7B/lora/train_2025-04-10-02-02-13/checkpoint-300/tokenizer_config.json

[INFO|2025-04-10 02:31:25] tokenization_utils_base.py:2509 >> Special tokens file saved in saves/Qwen2.5-7B/lora/train_2025-04-10-02-02-13/checkpoint-300/special_tokens_map.json

[INFO|2025-04-10 02:31:44] logging.py:143 >> {'loss': 3.1015, 'learning_rate': 4.1770e-06, 'epoch': 2.44, 'throughput': 355.52}

[INFO|2025-04-10 02:32:02] logging.py:143 >> {'loss': 3.0393, 'learning_rate': 3.6159e-06, 'epoch': 2.48, 'throughput': 355.42}

[INFO|2025-04-10 02:32:20] logging.py:143 >> {'loss': 2.9964, 'learning_rate': 3.0923e-06, 'epoch': 2.52, 'throughput': 354.97}

[INFO|2025-04-10 02:32:38] logging.py:143 >> {'loss': 3.1617, 'learning_rate': 2.6072e-06, 'epoch': 2.56, 'throughput': 355.40}

[INFO|2025-04-10 02:32:58] logging.py:143 >> {'loss': 3.0388, 'learning_rate': 2.1614e-06, 'epoch': 2.60, 'throughput': 356.66}

[INFO|2025-04-10 02:33:16] logging.py:143 >> {'loss': 3.1580, 'learning_rate': 1.7556e-06, 'epoch': 2.64, 'throughput': 356.13}

[INFO|2025-04-10 02:33:34] logging.py:143 >> {'loss': 3.0348, 'learning_rate': 1.3906e-06, 'epoch': 2.68, 'throughput': 356.25}

[INFO|2025-04-10 02:33:52] logging.py:143 >> {'loss': 3.1515, 'learning_rate': 1.0670e-06, 'epoch': 2.72, 'throughput': 355.61}

[INFO|2025-04-10 02:34:11] logging.py:143 >> {'loss': 3.0978, 'learning_rate': 7.8542e-07, 'epoch': 2.76, 'throughput': 356.26}

[INFO|2025-04-10 02:34:29] logging.py:143 >> {'loss': 3.0510, 'learning_rate': 5.4631e-07, 'epoch': 2.80, 'throughput': 356.49}

[INFO|2025-04-10 02:34:47] logging.py:143 >> {'loss': 3.0632, 'learning_rate': 3.5010e-07, 'epoch': 2.84, 'throughput': 356.36}

[INFO|2025-04-10 02:35:05] logging.py:143 >> {'loss': 3.0875, 'learning_rate': 1.9713e-07, 'epoch': 2.88, 'throughput': 355.21}

[INFO|2025-04-10 02:35:23] logging.py:143 >> {'loss': 3.1218, 'learning_rate': 8.7679e-08, 'epoch': 2.92, 'throughput': 356.10}

[INFO|2025-04-10 02:35:42] logging.py:143 >> {'loss': 3.0581, 'learning_rate': 2.1929e-08, 'epoch': 2.96, 'throughput': 357.04}

[INFO|2025-04-10 02:36:00] logging.py:143 >> {'loss': 2.9967, 'learning_rate': 0.0000e+00, 'epoch': 3.00, 'throughput': 356.12}

[INFO|2025-04-10 02:36:00] trainer.py:3942 >> Saving model checkpoint to saves/Qwen2.5-7B/lora/train_2025-04-10-02-02-13/checkpoint-375

[INFO|2025-04-10 02:36:01] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B/snapshots/d149729398750b98c0af14eb82c78cfe92750796/config.json

[INFO|2025-04-10 02:36:01] configuration_utils.py:771 >> Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 131072,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.49.0",
  "use_cache": true,
  "use_mrope": false,
  "use_sliding_window": false,
  "vocab_size": 152064
}


[INFO|2025-04-10 02:36:01] tokenization_utils_base.py:2500 >> tokenizer config file saved in saves/Qwen2.5-7B/lora/train_2025-04-10-02-02-13/checkpoint-375/tokenizer_config.json

[INFO|2025-04-10 02:36:01] tokenization_utils_base.py:2509 >> Special tokens file saved in saves/Qwen2.5-7B/lora/train_2025-04-10-02-02-13/checkpoint-375/special_tokens_map.json

[INFO|2025-04-10 02:36:02] trainer.py:2657 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|2025-04-10 02:36:02] trainer.py:3942 >> Saving model checkpoint to saves/Qwen2.5-7B/lora/train_2025-04-10-02-02-13

[INFO|2025-04-10 02:36:02] configuration_utils.py:699 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-7B/snapshots/d149729398750b98c0af14eb82c78cfe92750796/config.json

[INFO|2025-04-10 02:36:02] configuration_utils.py:771 >> Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 131072,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.49.0",
  "use_cache": true,
  "use_mrope": false,
  "use_sliding_window": false,
  "vocab_size": 152064
}


[INFO|2025-04-10 02:36:02] tokenization_utils_base.py:2500 >> tokenizer config file saved in saves/Qwen2.5-7B/lora/train_2025-04-10-02-02-13/tokenizer_config.json

[INFO|2025-04-10 02:36:02] tokenization_utils_base.py:2509 >> Special tokens file saved in saves/Qwen2.5-7B/lora/train_2025-04-10-02-02-13/special_tokens_map.json

[WARNING|2025-04-10 02:36:02] logging.py:148 >> No metric eval_loss to plot.

[WARNING|2025-04-10 02:36:02] logging.py:148 >> No metric eval_accuracy to plot.

[INFO|2025-04-10 02:36:02] modelcard.py:449 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}