Instructions to use chrisrutherford/Qwen3.5-4B-Base-PumlGenV3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use chrisrutherford/Qwen3.5-4B-Base-PumlGenV3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="chrisrutherford/Qwen3.5-4B-Base-PumlGenV3")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("chrisrutherford/Qwen3.5-4B-Base-PumlGenV3")
model = AutoModelForMultimodalLM.from_pretrained("chrisrutherford/Qwen3.5-4B-Base-PumlGenV3")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use chrisrutherford/Qwen3.5-4B-Base-PumlGenV3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "chrisrutherford/Qwen3.5-4B-Base-PumlGenV3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "chrisrutherford/Qwen3.5-4B-Base-PumlGenV3",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/chrisrutherford/Qwen3.5-4B-Base-PumlGenV3

SGLang

How to use chrisrutherford/Qwen3.5-4B-Base-PumlGenV3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "chrisrutherford/Qwen3.5-4B-Base-PumlGenV3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "chrisrutherford/Qwen3.5-4B-Base-PumlGenV3",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "chrisrutherford/Qwen3.5-4B-Base-PumlGenV3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "chrisrutherford/Qwen3.5-4B-Base-PumlGenV3",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use chrisrutherford/Qwen3.5-4B-Base-PumlGenV3 with Docker Model Runner:
```
docker model run hf.co/chrisrutherford/Qwen3.5-4B-Base-PumlGenV3
```

Qwen3.5-4B-Base-PumlGenV3

File size: 143,442 Bytes

d60d79c

[INFO|2026-04-20 15:20:42] image_processing_base.py:344 >> loading configuration file preprocessor_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3.5-4B-Base/snapshots/57370f0ea82c3cca33558a95212e032c344e5fd5/preprocessor_config.json
[INFO|2026-04-20 15:20:42] processing_utils.py:1095 >> loading configuration file processor_config.json from cache at None
[INFO|2026-04-20 15:20:43] image_processing_base.py:344 >> loading configuration file preprocessor_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3.5-4B-Base/snapshots/57370f0ea82c3cca33558a95212e032c344e5fd5/preprocessor_config.json
[INFO|2026-04-20 15:20:43] image_processing_base.py:344 >> loading configuration file preprocessor_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3.5-4B-Base/snapshots/57370f0ea82c3cca33558a95212e032c344e5fd5/preprocessor_config.json
[INFO|2026-04-20 15:20:43] image_processing_base.py:377 >> Image processor Qwen2VLImageProcessorFast {
  "data_format": "channels_first",
  "do_convert_rgb": true,
  "do_normalize": true,
  "do_rescale": true,
  "do_resize": true,
  "image_mean": [
    0.5,
    0.5,
    0.5
  ],
  "image_processor_type": "Qwen2VLImageProcessorFast",
  "image_std": [
    0.5,
    0.5,
    0.5
  ],
  "merge_size": 2,
  "patch_size": 16,
  "resample": 3,
  "rescale_factor": 0.00392156862745098,
  "size": {
    "longest_edge": 16777216,
    "shortest_edge": 65536
  },
  "temporal_patch_size": 2
}

[INFO|2026-04-20 15:20:43] configuration_utils.py:670 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3.5-4B-Base/snapshots/57370f0ea82c3cca33558a95212e032c344e5fd5/config.json
[INFO|2026-04-20 15:20:43] configuration_utils.py:742 >> Model config Qwen3_5Config {
  "architectures": [
    "Qwen3_5ForConditionalGeneration"
  ],
  "image_token_id": 248056,
  "model_type": "qwen3_5",
  "text_config": {
    "attention_bias": false,
    "attention_dropout": 0.0,
    "attn_output_gate": true,
    "bos_token_id": null,
    "dtype": "bfloat16",
    "eos_token_id": 248044,
    "full_attention_interval": 4,
    "head_dim": 256,
    "hidden_act": "silu",
    "hidden_size": 2560,
    "initializer_range": 0.02,
    "intermediate_size": 9216,
    "layer_types": [
      "linear_attention",
      "linear_attention",
      "linear_attention",
      "full_attention",
      "linear_attention",
      "linear_attention",
      "linear_attention",
      "full_attention",
      "linear_attention",
      "linear_attention",
      "linear_attention",
      "full_attention",
      "linear_attention",
      "linear_attention",
      "linear_attention",
      "full_attention",
      "linear_attention",
      "linear_attention",
      "linear_attention",
      "full_attention",
      "linear_attention",
      "linear_attention",
      "linear_attention",
      "full_attention",
      "linear_attention",
      "linear_attention",
      "linear_attention",
      "full_attention",
      "linear_attention",
      "linear_attention",
      "linear_attention",
      "full_attention"
    ],
    "linear_conv_kernel_dim": 4,
    "linear_key_head_dim": 128,
    "linear_num_key_heads": 16,
    "linear_num_value_heads": 32,
    "linear_value_head_dim": 128,
    "mamba_ssm_dtype": "float32",
    "max_position_embeddings": 262144,
    "mlp_only_layers": [],
    "model_type": "qwen3_5_text",
    "mtp_num_hidden_layers": 1,
    "mtp_use_dedicated_embeddings": false,
    "num_attention_heads": 16,
    "num_hidden_layers": 32,
    "num_key_value_heads": 4,
    "pad_token_id": null,
    "partial_rotary_factor": 0.25,
    "rms_norm_eps": 1e-06,
    "rope_parameters": {
      "mrope_interleaved": true,
      "mrope_section": [
        11,
        11,
        10
      ],
      "partial_rotary_factor": 0.25,
      "rope_theta": 10000000,
      "rope_type": "default"
    },
    "tie_word_embeddings": true,
    "use_cache": true,
    "vocab_size": 248320
  },
  "tie_word_embeddings": true,
  "transformers_version": "5.2.0",
  "video_token_id": 248057,
  "vision_config": {
    "deepstack_visual_indexes": [],
    "depth": 24,
    "hidden_act": "gelu_pytorch_tanh",
    "hidden_size": 1024,
    "in_channels": 3,
    "initializer_range": 0.02,
    "intermediate_size": 4096,
    "model_type": "qwen3_5",
    "num_heads": 16,
    "num_position_embeddings": 2304,
    "out_hidden_size": 2560,
    "patch_size": 16,
    "spatial_merge_size": 2,
    "temporal_patch_size": 2
  },
  "vision_end_token_id": 248054,
  "vision_start_token_id": 248053
}

[INFO|2026-04-20 15:20:46] video_processing_utils.py:714 >> loading configuration file video_preprocessor_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3.5-4B-Base/snapshots/57370f0ea82c3cca33558a95212e032c344e5fd5/video_preprocessor_config.json
[INFO|2026-04-20 15:20:46] video_processing_utils.py:714 >> loading configuration file video_preprocessor_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3.5-4B-Base/snapshots/57370f0ea82c3cca33558a95212e032c344e5fd5/video_preprocessor_config.json
[INFO|2026-04-20 15:20:46] video_processing_utils.py:759 >> Video processor Qwen3VLVideoProcessor {
  "data_format": "channels_first",
  "default_to_square": true,
  "do_convert_rgb": true,
  "do_normalize": true,
  "do_rescale": true,
  "do_resize": true,
  "do_sample_frames": true,
  "fps": 2,
  "image_mean": [
    0.5,
    0.5,
    0.5
  ],
  "image_std": [
    0.5,
    0.5,
    0.5
  ],
  "max_frames": 768,
  "merge_size": 2,
  "min_frames": 4,
  "patch_size": 16,
  "resample": 3,
  "rescale_factor": 0.00392156862745098,
  "return_metadata": false,
  "size": {
    "longest_edge": 234881024,
    "shortest_edge": 4096
  },
  "temporal_patch_size": 2,
  "video_processor_type": "Qwen3VLVideoProcessor"
}

[INFO|2026-04-20 15:20:47] processing_utils.py:1170 >> Processor Qwen3VLProcessor:
- image_processor: Qwen2VLImageProcessorFast {
  "data_format": "channels_first",
  "do_convert_rgb": true,
  "do_normalize": true,
  "do_rescale": true,
  "do_resize": true,
  "image_mean": [
    0.5,
    0.5,
    0.5
  ],
  "image_processor_type": "Qwen2VLImageProcessorFast",
  "image_std": [
    0.5,
    0.5,
    0.5
  ],
  "merge_size": 2,
  "patch_size": 16,
  "resample": 3,
  "rescale_factor": 0.00392156862745098,
  "size": {
    "longest_edge": 16777216,
    "shortest_edge": 65536
  },
  "temporal_patch_size": 2
}

- tokenizer: TokenizersBackend(name_or_path='Qwen/Qwen3.5-4B-Base', vocab_size=248044, model_max_length=262144, padding_side='right', truncation_side='right', special_tokens={'eos_token': '<|endoftext|>', 'pad_token': '<|endoftext|>', 'audio_bos_token': '<|audio_start|>', 'audio_eos_token': '<|audio_end|>', 'audio_token': '<|audio_pad|>', 'image_token': '<|image_pad|>', 'video_token': '<|video_pad|>', 'vision_bos_token': '<|vision_start|>', 'vision_eos_token': '<|vision_end|>'}, added_tokens_decoder={
	248044: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248045: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248046: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248047: AddedToken("<|object_ref_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248048: AddedToken("<|object_ref_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248049: AddedToken("<|box_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248050: AddedToken("<|box_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248051: AddedToken("<|quad_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248052: AddedToken("<|quad_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248053: AddedToken("<|vision_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248054: AddedToken("<|vision_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248055: AddedToken("<|vision_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248056: AddedToken("<|image_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248057: AddedToken("<|video_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248058: AddedToken("<tool_call>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
	248059: AddedToken("</tool_call>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
	248060: AddedToken("<|fim_prefix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
	248061: AddedToken("<|fim_middle|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
	248062: AddedToken("<|fim_suffix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
	248063: AddedToken("<|fim_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
	248064: AddedToken("<|repo_name|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
	248065: AddedToken("<|file_sep|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
	248066: AddedToken("<tool_response>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
	248067: AddedToken("</tool_response>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
	248068: AddedToken("<think>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
	248069: AddedToken("</think>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
	248070: AddedToken("<|audio_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248071: AddedToken("<|audio_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248072: AddedToken("<tts_pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248073: AddedToken("<tts_text_bos>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248074: AddedToken("<tts_text_eod>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248075: AddedToken("<tts_text_bos_single>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	248076: AddedToken("<|audio_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
)
- video_processor: Qwen3VLVideoProcessor {
  "data_format": "channels_first",
  "default_to_square": true,
  "do_convert_rgb": true,
  "do_normalize": true,
  "do_rescale": true,
  "do_resize": true,
  "do_sample_frames": true,
  "fps": 2,
  "image_mean": [
    0.5,
    0.5,
    0.5
  ],
  "image_std": [
    0.5,
    0.5,
    0.5
  ],
  "max_frames": 768,
  "merge_size": 2,
  "min_frames": 4,
  "patch_size": 16,
  "resample": 3,
  "rescale_factor": 0.00392156862745098,
  "return_metadata": false,
  "size": {
    "longest_edge": 234881024,
    "shortest_edge": 4096
  },
  "temporal_patch_size": 2,
  "video_processor_type": "Qwen3VLVideoProcessor"
}


{
  "image_processor": {
    "data_format": "channels_first",
    "do_convert_rgb": true,
    "do_normalize": true,
    "do_rescale": true,
    "do_resize": true,
    "image_mean": [
      0.5,
      0.5,
      0.5
    ],
    "image_processor_type": "Qwen2VLImageProcessorFast",
    "image_std": [
      0.5,
      0.5,
      0.5
    ],
    "merge_size": 2,
    "patch_size": 16,
    "resample": 3,
    "rescale_factor": 0.00392156862745098,
    "size": {
      "longest_edge": 16777216,
      "shortest_edge": 65536
    },
    "temporal_patch_size": 2
  },
  "processor_class": "Qwen3VLProcessor",
  "video_processor": {
    "data_format": "channels_first",
    "default_to_square": true,
    "do_convert_rgb": true,
    "do_normalize": true,
    "do_rescale": true,
    "do_resize": true,
    "do_sample_frames": true,
    "fps": 2,
    "image_mean": [
      0.5,
      0.5,
      0.5
    ],
    "image_std": [
      0.5,
      0.5,
      0.5
    ],
    "max_frames": 768,
    "merge_size": 2,
    "min_frames": 4,
    "patch_size": 16,
    "resample": 3,
    "rescale_factor": 0.00392156862745098,
    "return_metadata": false,
    "size": {
      "longest_edge": 234881024,
      "shortest_edge": 4096
    },
    "temporal_patch_size": 2,
    "video_processor_type": "Qwen3VLVideoProcessor"
  }
}

[INFO|2026-04-20 15:20:47] logging.py:144 >> Replace eos token: <|im_end|>.
[INFO|2026-04-20 15:20:47] logging.py:144 >> Loading dataset pumlGenV3.json...
[INFO|2026-04-20 15:21:06] configuration_utils.py:670 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3.5-4B-Base/snapshots/57370f0ea82c3cca33558a95212e032c344e5fd5/config.json
[INFO|2026-04-20 15:21:06] configuration_utils.py:742 >> Model config Qwen3_5Config {
  "architectures": [
    "Qwen3_5ForConditionalGeneration"
  ],
  "image_token_id": 248056,
  "model_type": "qwen3_5",
  "text_config": {
    "attention_bias": false,
    "attention_dropout": 0.0,
    "attn_output_gate": true,
    "bos_token_id": null,
    "dtype": "bfloat16",
    "eos_token_id": 248044,
    "full_attention_interval": 4,
    "head_dim": 256,
    "hidden_act": "silu",
    "hidden_size": 2560,
    "initializer_range": 0.02,
    "intermediate_size": 9216,
    "layer_types": [
      "linear_attention",
      "linear_attention",
      "linear_attention",
      "full_attention",
      "linear_attention",
      "linear_attention",
      "linear_attention",
      "full_attention",
      "linear_attention",
      "linear_attention",
      "linear_attention",
      "full_attention",
      "linear_attention",
      "linear_attention",
      "linear_attention",
      "full_attention",
      "linear_attention",
      "linear_attention",
      "linear_attention",
      "full_attention",
      "linear_attention",
      "linear_attention",
      "linear_attention",
      "full_attention",
      "linear_attention",
      "linear_attention",
      "linear_attention",
      "full_attention",
      "linear_attention",
      "linear_attention",
      "linear_attention",
      "full_attention"
    ],
    "linear_conv_kernel_dim": 4,
    "linear_key_head_dim": 128,
    "linear_num_key_heads": 16,
    "linear_num_value_heads": 32,
    "linear_value_head_dim": 128,
    "mamba_ssm_dtype": "float32",
    "max_position_embeddings": 262144,
    "mlp_only_layers": [],
    "model_type": "qwen3_5_text",
    "mtp_num_hidden_layers": 1,
    "mtp_use_dedicated_embeddings": false,
    "num_attention_heads": 16,
    "num_hidden_layers": 32,
    "num_key_value_heads": 4,
    "pad_token_id": null,
    "partial_rotary_factor": 0.25,
    "rms_norm_eps": 1e-06,
    "rope_parameters": {
      "mrope_interleaved": true,
      "mrope_section": [
        11,
        11,
        10
      ],
      "partial_rotary_factor": 0.25,
      "rope_theta": 10000000,
      "rope_type": "default"
    },
    "tie_word_embeddings": true,
    "use_cache": true,
    "vocab_size": 248320
  },
  "tie_word_embeddings": true,
  "transformers_version": "5.2.0",
  "video_token_id": 248057,
  "vision_config": {
    "deepstack_visual_indexes": [],
    "depth": 24,
    "hidden_act": "gelu_pytorch_tanh",
    "hidden_size": 1024,
    "in_channels": 3,
    "initializer_range": 0.02,
    "intermediate_size": 4096,
    "model_type": "qwen3_5",
    "num_heads": 16,
    "num_position_embeddings": 2304,
    "out_hidden_size": 2560,
    "patch_size": 16,
    "spatial_merge_size": 2,
    "temporal_patch_size": 2
  },
  "vision_end_token_id": 248054,
  "vision_start_token_id": 248053
}

[INFO|2026-04-20 15:21:06] logging.py:144 >> KV cache is disabled during training.
[INFO|2026-04-20 15:21:07] modeling_utils.py:710 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3.5-4B-Base/snapshots/57370f0ea82c3cca33558a95212e032c344e5fd5/model.safetensors.index.json
[INFO|2026-04-20 15:21:07] modeling_utils.py:790 >> Since the `dtype` attribute can't be found in model's config object, will use dtype={dtype} as derived from model's weights
[INFO|2026-04-20 15:21:07] modeling_utils.py:3560 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model
[INFO|2026-04-20 15:21:07] configuration_utils.py:1014 >> Generate config GenerationConfig {
  "output_attentions": false,
  "output_hidden_states": false,
  "use_cache": false
}

[WARNING|2026-04-20 15:21:07] logging.py:327 >> The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
[WARNING|2026-04-20 15:21:07] logging.py:327 >> The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
[WARNING|2026-04-20 15:21:07] logging.py:327 >> The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
[WARNING|2026-04-20 15:21:07] logging.py:327 >> The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
[WARNING|2026-04-20 15:21:07] logging.py:327 >> The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
[WARNING|2026-04-20 15:21:07] logging.py:327 >> The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
[WARNING|2026-04-20 15:21:07] logging.py:327 >> The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
[WARNING|2026-04-20 15:21:07] logging.py:327 >> The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
[INFO|2026-04-20 15:21:29] utils.py:411 >> Generation config file not found, using a generation config created from the model config.
[INFO|2026-04-20 15:21:29] configuration_utils.py:967 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen3.5-4B-Base/snapshots/57370f0ea82c3cca33558a95212e032c344e5fd5/config.json
[INFO|2026-04-20 15:21:29] configuration_utils.py:1014 >> Generate config GenerationConfig {}

[INFO|2026-04-20 15:21:29] dynamic_module_utils.py:406 >> Could not locate the custom_generate/generate.py inside Qwen/Qwen3.5-4B-Base.
[INFO|2026-04-20 15:21:29] logging.py:144 >> Gradient checkpointing enabled.
[INFO|2026-04-20 15:21:29] logging.py:144 >> Using torch SDPA for faster training and inference.
[INFO|2026-04-20 15:21:29] logging.py:144 >> DeepSpeed ZeRO3 detected, remaining trainable params in float32.
[INFO|2026-04-20 15:21:29] logging.py:144 >> Fine-tuning method: Full
[INFO|2026-04-20 15:21:29] logging.py:144 >> Set vision model not trainable: ['visual.pos_embed', 'visual.patch_embed', 'visual.blocks'].
[INFO|2026-04-20 15:21:29] logging.py:144 >> Set multi model projector not trainable: ['model.visual.merger'].
[INFO|2026-04-20 15:21:29] logging.py:144 >> trainable params: 4,205,751,296 || all params: 4,539,265,536 || trainable%: 92.6527
[WARNING|2026-04-20 15:21:29] trainer_utils.py:1234 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248046, 'pad_token_id': 248044}.
[WARNING|2026-04-20 15:21:29] trainer_utils.py:1234 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248046, 'pad_token_id': 248044}.
[WARNING|2026-04-20 15:21:29] trainer_utils.py:1234 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248046, 'pad_token_id': 248044}.
[WARNING|2026-04-20 15:21:29] trainer_utils.py:1234 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248046, 'pad_token_id': 248044}.
[WARNING|2026-04-20 15:21:29] trainer_utils.py:1234 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248046, 'pad_token_id': 248044}.
[WARNING|2026-04-20 15:21:29] trainer_utils.py:1234 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248046, 'pad_token_id': 248044}.
[WARNING|2026-04-20 15:21:29] trainer_utils.py:1234 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248046, 'pad_token_id': 248044}.
[WARNING|2026-04-20 15:21:29] trainer_utils.py:1234 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248046, 'pad_token_id': 248044}.
[INFO|2026-04-20 15:21:30] trainer.py:1198 >> skipped Embedding(2304, 1024): 0.0M params
[INFO|2026-04-20 15:21:30] trainer.py:1198 >> skipped Embedding(248320, 2560): 0.0M params
[INFO|2026-04-20 15:21:30] trainer.py:1201 >> skipped: 0.0M params
[INFO|2026-04-20 15:21:36] trainer.py:1587 >> ***** Running training *****
[INFO|2026-04-20 15:21:36] trainer.py:1588 >>   Num examples = 38,538
[INFO|2026-04-20 15:21:36] trainer.py:1589 >>   Num Epochs = 3
[INFO|2026-04-20 15:21:36] trainer.py:1590 >>   Instantaneous batch size per device = 1
[INFO|2026-04-20 15:21:36] trainer.py:1593 >>   Total train batch size (w. parallel, distributed & accumulation) = 128
[INFO|2026-04-20 15:21:36] trainer.py:1594 >>   Gradient Accumulation steps = 16
[INFO|2026-04-20 15:21:36] trainer.py:1595 >>   Total optimization steps = 906
[INFO|2026-04-20 15:21:36] trainer.py:1596 >>   Number of trainable parameters = 4,205,751,296
[INFO|2026-04-20 15:24:13] logging.py:144 >> {'loss': 0.5379, 'learning_rate': 5.0000e-05, 'epoch': 0.00, 'throughput': 1341.58}
[INFO|2026-04-20 15:27:32] logging.py:144 >> {'loss': 1.0608, 'learning_rate': 5.0000e-05, 'epoch': 0.01, 'throughput': 1153.75}
[INFO|2026-04-20 15:32:59] logging.py:144 >> {'loss': 0.9687, 'learning_rate': 4.9999e-05, 'epoch': 0.01, 'throughput': 898.89}
[INFO|2026-04-20 15:39:50] logging.py:144 >> {'loss': 0.9387, 'learning_rate': 4.9999e-05, 'epoch': 0.01, 'throughput': 756.93}
[INFO|2026-04-20 15:45:44] logging.py:144 >> {'loss': 0.7523, 'learning_rate': 4.9998e-05, 'epoch': 0.02, 'throughput': 712.51}
[INFO|2026-04-20 15:52:26] logging.py:144 >> {'loss': 0.6717, 'learning_rate': 4.9996e-05, 'epoch': 0.02, 'throughput': 672.39}
[INFO|2026-04-20 15:59:06] logging.py:144 >> {'loss': 0.5992, 'learning_rate': 4.9995e-05, 'epoch': 0.02, 'throughput': 648.85}
[INFO|2026-04-20 16:05:18] logging.py:144 >> {'loss': 0.5814, 'learning_rate': 4.9993e-05, 'epoch': 0.03, 'throughput': 638.23}
[INFO|2026-04-20 16:11:07] logging.py:144 >> {'loss': 0.5496, 'learning_rate': 4.9990e-05, 'epoch': 0.03, 'throughput': 633.37}
[INFO|2026-04-20 16:16:57] logging.py:144 >> {'loss': 0.5153, 'learning_rate': 4.9988e-05, 'epoch': 0.03, 'throughput': 628.92}
[INFO|2026-04-20 16:22:58] logging.py:144 >> {'loss': 0.5036, 'learning_rate': 4.9985e-05, 'epoch': 0.04, 'throughput': 623.00}
[INFO|2026-04-20 16:28:52] logging.py:144 >> {'loss': 0.4851, 'learning_rate': 4.9982e-05, 'epoch': 0.04, 'throughput': 621.26}
[INFO|2026-04-20 16:34:32] logging.py:144 >> {'loss': 0.4943, 'learning_rate': 4.9978e-05, 'epoch': 0.04, 'throughput': 621.17}
[INFO|2026-04-20 16:40:45] logging.py:144 >> {'loss': 0.4779, 'learning_rate': 4.9975e-05, 'epoch': 0.05, 'throughput': 616.18}
[INFO|2026-04-20 16:47:25] logging.py:144 >> {'loss': 0.4823, 'learning_rate': 4.9971e-05, 'epoch': 0.05, 'throughput': 609.07}
[INFO|2026-04-20 16:53:57] logging.py:144 >> {'loss': 0.4785, 'learning_rate': 4.9966e-05, 'epoch': 0.05, 'throughput': 604.62}
[INFO|2026-04-20 17:00:22] logging.py:144 >> {'loss': 0.4880, 'learning_rate': 4.9962e-05, 'epoch': 0.06, 'throughput': 601.09}
[INFO|2026-04-20 17:06:55] logging.py:144 >> {'loss': 0.4697, 'learning_rate': 4.9957e-05, 'epoch': 0.06, 'throughput': 597.32}
[INFO|2026-04-20 17:13:37] logging.py:144 >> {'loss': 0.4697, 'learning_rate': 4.9951e-05, 'epoch': 0.06, 'throughput': 592.86}
[INFO|2026-04-20 17:20:24] logging.py:144 >> {'loss': 0.4789, 'learning_rate': 4.9946e-05, 'epoch': 0.07, 'throughput': 588.42}
[INFO|2026-04-20 17:26:56] logging.py:144 >> {'loss': 0.4772, 'learning_rate': 4.9940e-05, 'epoch': 0.07, 'throughput': 585.52}
[INFO|2026-04-20 17:33:04] logging.py:144 >> {'loss': 0.4744, 'learning_rate': 4.9934e-05, 'epoch': 0.07, 'throughput': 584.99}
[INFO|2026-04-20 17:39:20] logging.py:144 >> {'loss': 0.4629, 'learning_rate': 4.9927e-05, 'epoch': 0.08, 'throughput': 583.71}
[INFO|2026-04-20 17:46:14] logging.py:144 >> {'loss': 0.4645, 'learning_rate': 4.9921e-05, 'epoch': 0.08, 'throughput': 579.99}
[INFO|2026-04-20 17:52:13] logging.py:144 >> {'loss': 0.4628, 'learning_rate': 4.9913e-05, 'epoch': 0.08, 'throughput': 579.90}
[INFO|2026-04-20 17:58:24] logging.py:144 >> {'loss': 0.4780, 'learning_rate': 4.9906e-05, 'epoch': 0.09, 'throughput': 579.31}
[INFO|2026-04-20 18:05:25] logging.py:144 >> {'loss': 0.4747, 'learning_rate': 4.9898e-05, 'epoch': 0.09, 'throughput': 575.66}
[INFO|2026-04-20 18:11:45] logging.py:144 >> {'loss': 0.4631, 'learning_rate': 4.9891e-05, 'epoch': 0.09, 'throughput': 574.23}
[INFO|2026-04-20 18:18:16] logging.py:144 >> {'loss': 0.4600, 'learning_rate': 4.9882e-05, 'epoch': 0.10, 'throughput': 572.61}
[INFO|2026-04-20 18:25:16] logging.py:144 >> {'loss': 0.4782, 'learning_rate': 4.9874e-05, 'epoch': 0.10, 'throughput': 569.62}
[INFO|2026-04-20 18:31:50] logging.py:144 >> {'loss': 0.4618, 'learning_rate': 4.9865e-05, 'epoch': 0.10, 'throughput': 568.54}
[INFO|2026-04-20 18:38:42] logging.py:144 >> {'loss': 0.4520, 'learning_rate': 4.9856e-05, 'epoch': 0.11, 'throughput': 566.02}
[INFO|2026-04-20 18:44:52] logging.py:144 >> {'loss': 0.4566, 'learning_rate': 4.9846e-05, 'epoch': 0.11, 'throughput': 566.56}
[INFO|2026-04-20 18:51:40] logging.py:144 >> {'loss': 0.4522, 'learning_rate': 4.9837e-05, 'epoch': 0.11, 'throughput': 564.74}
[INFO|2026-04-20 18:58:50] logging.py:144 >> {'loss': 0.4630, 'learning_rate': 4.9826e-05, 'epoch': 0.12, 'throughput': 562.11}
[INFO|2026-04-20 19:06:20] logging.py:144 >> {'loss': 0.4690, 'learning_rate': 4.9816e-05, 'epoch': 0.12, 'throughput': 558.94}
[INFO|2026-04-20 19:13:55] logging.py:144 >> {'loss': 0.4629, 'learning_rate': 4.9805e-05, 'epoch': 0.12, 'throughput': 555.84}
[INFO|2026-04-20 19:20:58] logging.py:144 >> {'loss': 0.4757, 'learning_rate': 4.9795e-05, 'epoch': 0.13, 'throughput': 554.00}
[INFO|2026-04-20 19:27:24] logging.py:144 >> {'loss': 0.4495, 'learning_rate': 4.9783e-05, 'epoch': 0.13, 'throughput': 553.99}
[INFO|2026-04-20 19:34:20] logging.py:144 >> {'loss': 0.4762, 'learning_rate': 4.9772e-05, 'epoch': 0.13, 'throughput': 552.28}
[INFO|2026-04-20 19:41:33] logging.py:144 >> {'loss': 0.4603, 'learning_rate': 4.9760e-05, 'epoch': 0.14, 'throughput': 550.03}
[INFO|2026-04-20 19:48:22] logging.py:144 >> {'loss': 0.4573, 'learning_rate': 4.9748e-05, 'epoch': 0.14, 'throughput': 549.32}
[INFO|2026-04-20 19:55:28] logging.py:144 >> {'loss': 0.4572, 'learning_rate': 4.9735e-05, 'epoch': 0.14, 'throughput': 547.79}
[INFO|2026-04-20 20:01:20] logging.py:144 >> {'loss': 0.4637, 'learning_rate': 4.9723e-05, 'epoch': 0.15, 'throughput': 548.42}
[INFO|2026-04-20 20:07:34] logging.py:144 >> {'loss': 0.4485, 'learning_rate': 4.9710e-05, 'epoch': 0.15, 'throughput': 548.68}
[INFO|2026-04-20 20:14:24] logging.py:144 >> {'loss': 0.4744, 'learning_rate': 4.9696e-05, 'epoch': 0.15, 'throughput': 548.08}
[INFO|2026-04-20 20:21:01] logging.py:144 >> {'loss': 0.4498, 'learning_rate': 4.9683e-05, 'epoch': 0.16, 'throughput': 547.87}
[INFO|2026-04-20 20:27:30] logging.py:144 >> {'loss': 0.4402, 'learning_rate': 4.9669e-05, 'epoch': 0.16, 'throughput': 547.67}
[INFO|2026-04-20 20:34:43] logging.py:144 >> {'loss': 0.4502, 'learning_rate': 4.9655e-05, 'epoch': 0.16, 'throughput': 546.25}
[INFO|2026-04-20 20:41:35] logging.py:144 >> {'loss': 0.4449, 'learning_rate': 4.9640e-05, 'epoch': 0.17, 'throughput': 545.57}
[INFO|2026-04-20 20:48:41] logging.py:144 >> {'loss': 0.4647, 'learning_rate': 4.9625e-05, 'epoch': 0.17, 'throughput': 544.43}
[INFO|2026-04-20 20:55:09] logging.py:144 >> {'loss': 0.4434, 'learning_rate': 4.9610e-05, 'epoch': 0.17, 'throughput': 544.24}
[INFO|2026-04-20 21:02:11] logging.py:144 >> {'loss': 0.4552, 'learning_rate': 4.9595e-05, 'epoch': 0.18, 'throughput': 543.20}
[INFO|2026-04-20 21:09:11] logging.py:144 >> {'loss': 0.4416, 'learning_rate': 4.9579e-05, 'epoch': 0.18, 'throughput': 542.16}
[INFO|2026-04-20 21:16:01] logging.py:144 >> {'loss': 0.4537, 'learning_rate': 4.9563e-05, 'epoch': 0.18, 'throughput': 541.88}
[INFO|2026-04-20 21:22:25] logging.py:144 >> {'loss': 0.4557, 'learning_rate': 4.9547e-05, 'epoch': 0.19, 'throughput': 542.10}
[INFO|2026-04-20 21:29:00] logging.py:144 >> {'loss': 0.4614, 'learning_rate': 4.9530e-05, 'epoch': 0.19, 'throughput': 541.98}
[INFO|2026-04-20 21:35:34] logging.py:144 >> {'loss': 0.4480, 'learning_rate': 4.9513e-05, 'epoch': 0.19, 'throughput': 541.84}
[INFO|2026-04-20 21:41:43] logging.py:144 >> {'loss': 0.4594, 'learning_rate': 4.9496e-05, 'epoch': 0.20, 'throughput': 542.39}
[INFO|2026-04-20 21:48:06] logging.py:144 >> {'loss': 0.4393, 'learning_rate': 4.9479e-05, 'epoch': 0.20, 'throughput': 542.75}
[INFO|2026-04-20 21:54:31] logging.py:144 >> {'loss': 0.4528, 'learning_rate': 4.9461e-05, 'epoch': 0.20, 'throughput': 542.86}
[INFO|2026-04-20 22:02:09] logging.py:144 >> {'loss': 0.4559, 'learning_rate': 4.9443e-05, 'epoch': 0.21, 'throughput': 541.49}
[INFO|2026-04-20 22:09:30] logging.py:144 >> {'loss': 0.4451, 'learning_rate': 4.9424e-05, 'epoch': 0.21, 'throughput': 540.28}
[INFO|2026-04-20 22:16:56] logging.py:144 >> {'loss': 0.4447, 'learning_rate': 4.9406e-05, 'epoch': 0.21, 'throughput': 539.23}
[INFO|2026-04-20 22:24:14] logging.py:144 >> {'loss': 0.4603, 'learning_rate': 4.9387e-05, 'epoch': 0.22, 'throughput': 538.21}
[INFO|2026-04-20 22:31:33] logging.py:144 >> {'loss': 0.4351, 'learning_rate': 4.9368e-05, 'epoch': 0.22, 'throughput': 537.00}
[INFO|2026-04-20 22:38:29] logging.py:144 >> {'loss': 0.4550, 'learning_rate': 4.9348e-05, 'epoch': 0.22, 'throughput': 536.25}
[INFO|2026-04-20 22:45:22] logging.py:144 >> {'loss': 0.4438, 'learning_rate': 4.9328e-05, 'epoch': 0.23, 'throughput': 535.56}
[INFO|2026-04-20 22:52:27] logging.py:144 >> {'loss': 0.4453, 'learning_rate': 4.9308e-05, 'epoch': 0.23, 'throughput': 535.15}
[INFO|2026-04-20 22:59:29] logging.py:144 >> {'loss': 0.4533, 'learning_rate': 4.9288e-05, 'epoch': 0.23, 'throughput': 534.68}
[INFO|2026-04-20 23:07:10] logging.py:144 >> {'loss': 0.4610, 'learning_rate': 4.9267e-05, 'epoch': 0.24, 'throughput': 533.30}
[INFO|2026-04-20 23:15:07] logging.py:144 >> {'loss': 0.4467, 'learning_rate': 4.9246e-05, 'epoch': 0.24, 'throughput': 531.81}
[INFO|2026-04-20 23:22:04] logging.py:144 >> {'loss': 0.4448, 'learning_rate': 4.9225e-05, 'epoch': 0.24, 'throughput': 531.40}
[INFO|2026-04-20 23:28:00] logging.py:144 >> {'loss': 0.4391, 'learning_rate': 4.9203e-05, 'epoch': 0.25, 'throughput': 532.04}
[INFO|2026-04-20 23:34:22] logging.py:144 >> {'loss': 0.4451, 'learning_rate': 4.9181e-05, 'epoch': 0.25, 'throughput': 532.17}
[INFO|2026-04-20 23:40:07] logging.py:144 >> {'loss': 0.4539, 'learning_rate': 4.9159e-05, 'epoch': 0.25, 'throughput': 532.84}
[INFO|2026-04-20 23:46:30] logging.py:144 >> {'loss': 0.4480, 'learning_rate': 4.9137e-05, 'epoch': 0.26, 'throughput': 533.11}
[INFO|2026-04-20 23:53:12] logging.py:144 >> {'loss': 0.4483, 'learning_rate': 4.9114e-05, 'epoch': 0.26, 'throughput': 532.89}
[INFO|2026-04-20 23:59:28] logging.py:144 >> {'loss': 0.4381, 'learning_rate': 4.9091e-05, 'epoch': 0.26, 'throughput': 533.53}
[INFO|2026-04-21 00:05:40] logging.py:144 >> {'loss': 0.4483, 'learning_rate': 4.9068e-05, 'epoch': 0.27, 'throughput': 533.88}
[INFO|2026-04-21 00:10:55] logging.py:144 >> {'loss': 0.4353, 'learning_rate': 4.9044e-05, 'epoch': 0.27, 'throughput': 535.26}
[INFO|2026-04-21 00:16:57] logging.py:144 >> {'loss': 0.4635, 'learning_rate': 4.9020e-05, 'epoch': 0.27, 'throughput': 535.72}
[INFO|2026-04-21 00:22:56] logging.py:144 >> {'loss': 0.4414, 'learning_rate': 4.8996e-05, 'epoch': 0.28, 'throughput': 536.14}
[INFO|2026-04-21 00:29:35] logging.py:144 >> {'loss': 0.4479, 'learning_rate': 4.8972e-05, 'epoch': 0.28, 'throughput': 536.13}
[INFO|2026-04-21 00:35:22] logging.py:144 >> {'loss': 0.4464, 'learning_rate': 4.8947e-05, 'epoch': 0.28, 'throughput': 536.94}
[INFO|2026-04-21 00:41:25] logging.py:144 >> {'loss': 0.4391, 'learning_rate': 4.8922e-05, 'epoch': 0.29, 'throughput': 537.53}
[INFO|2026-04-21 00:47:07] logging.py:144 >> {'loss': 0.4581, 'learning_rate': 4.8897e-05, 'epoch': 0.29, 'throughput': 538.27}
[INFO|2026-04-21 00:53:12] logging.py:144 >> {'loss': 0.4442, 'learning_rate': 4.8871e-05, 'epoch': 0.29, 'throughput': 538.95}
[INFO|2026-04-21 00:59:15] logging.py:144 >> {'loss': 0.4445, 'learning_rate': 4.8845e-05, 'epoch': 0.30, 'throughput': 539.35}
[INFO|2026-04-21 01:04:50] logging.py:144 >> {'loss': 0.4376, 'learning_rate': 4.8819e-05, 'epoch': 0.30, 'throughput': 540.26}
[INFO|2026-04-21 01:10:11] logging.py:144 >> {'loss': 0.4570, 'learning_rate': 4.8792e-05, 'epoch': 0.30, 'throughput': 541.02}
[INFO|2026-04-21 01:15:26] logging.py:144 >> {'loss': 0.4444, 'learning_rate': 4.8766e-05, 'epoch': 0.31, 'throughput': 542.17}
[INFO|2026-04-21 01:20:45] logging.py:144 >> {'loss': 0.4561, 'learning_rate': 4.8739e-05, 'epoch': 0.31, 'throughput': 543.30}
[INFO|2026-04-21 01:25:41] logging.py:144 >> {'loss': 0.4594, 'learning_rate': 4.8711e-05, 'epoch': 0.31, 'throughput': 544.61}
[INFO|2026-04-21 01:30:45] logging.py:144 >> {'loss': 0.4482, 'learning_rate': 4.8684e-05, 'epoch': 0.32, 'throughput': 545.87}
[INFO|2026-04-21 01:35:51] logging.py:144 >> {'loss': 0.4578, 'learning_rate': 4.8656e-05, 'epoch': 0.32, 'throughput': 547.09}
[INFO|2026-04-21 01:41:06] logging.py:144 >> {'loss': 0.4522, 'learning_rate': 4.8628e-05, 'epoch': 0.32, 'throughput': 548.02}
[INFO|2026-04-21 01:46:34] logging.py:144 >> {'loss': 0.4573, 'learning_rate': 4.8599e-05, 'epoch': 0.33, 'throughput': 548.64}
[INFO|2026-04-21 01:52:17] logging.py:144 >> {'loss': 0.4349, 'learning_rate': 4.8570e-05, 'epoch': 0.33, 'throughput': 549.06}
[INFO|2026-04-21 01:57:50] logging.py:144 >> {'loss': 0.4540, 'learning_rate': 4.8541e-05, 'epoch': 0.33, 'throughput': 549.58}
[INFO|2026-04-21 02:03:22] logging.py:144 >> {'loss': 0.4433, 'learning_rate': 4.8512e-05, 'epoch': 0.34, 'throughput': 550.25}
[INFO|2026-04-21 02:09:01] logging.py:144 >> {'loss': 0.4451, 'learning_rate': 4.8482e-05, 'epoch': 0.34, 'throughput': 550.94}
[INFO|2026-04-21 02:14:22] logging.py:144 >> {'loss': 0.4406, 'learning_rate': 4.8453e-05, 'epoch': 0.34, 'throughput': 551.76}
[INFO|2026-04-21 02:20:22] logging.py:144 >> {'loss': 0.4419, 'learning_rate': 4.8422e-05, 'epoch': 0.35, 'throughput': 552.06}
[INFO|2026-04-21 02:27:00] logging.py:144 >> {'loss': 0.4270, 'learning_rate': 4.8392e-05, 'epoch': 0.35, 'throughput': 551.84}
[INFO|2026-04-21 02:33:11] logging.py:144 >> {'loss': 0.4404, 'learning_rate': 4.8361e-05, 'epoch': 0.35, 'throughput': 552.18}
[INFO|2026-04-21 02:39:16] logging.py:144 >> {'loss': 0.4441, 'learning_rate': 4.8330e-05, 'epoch': 0.36, 'throughput': 552.53}
[INFO|2026-04-21 02:44:52] logging.py:144 >> {'loss': 0.4441, 'learning_rate': 4.8299e-05, 'epoch': 0.36, 'throughput': 553.11}
[INFO|2026-04-21 02:50:23] logging.py:144 >> {'loss': 0.4447, 'learning_rate': 4.8267e-05, 'epoch': 0.36, 'throughput': 553.68}
[INFO|2026-04-21 02:56:18] logging.py:144 >> {'loss': 0.4410, 'learning_rate': 4.8235e-05, 'epoch': 0.37, 'throughput': 554.00}
[INFO|2026-04-21 03:01:55] logging.py:144 >> {'loss': 0.4509, 'learning_rate': 4.8203e-05, 'epoch': 0.37, 'throughput': 554.53}
[INFO|2026-04-21 03:08:30] logging.py:144 >> {'loss': 0.4546, 'learning_rate': 4.8171e-05, 'epoch': 0.37, 'throughput': 554.42}
[INFO|2026-04-21 03:15:20] logging.py:144 >> {'loss': 0.4555, 'learning_rate': 4.8138e-05, 'epoch': 0.38, 'throughput': 554.08}
[INFO|2026-04-21 03:21:56] logging.py:144 >> {'loss': 0.4397, 'learning_rate': 4.8105e-05, 'epoch': 0.38, 'throughput': 553.99}
[INFO|2026-04-21 03:27:31] logging.py:144 >> {'loss': 0.4478, 'learning_rate': 4.8072e-05, 'epoch': 0.38, 'throughput': 554.50}
[INFO|2026-04-21 03:33:20] logging.py:144 >> {'loss': 0.4334, 'learning_rate': 4.8039e-05, 'epoch': 0.39, 'throughput': 554.86}
[INFO|2026-04-21 03:39:54] logging.py:144 >> {'loss': 0.4371, 'learning_rate': 4.8005e-05, 'epoch': 0.39, 'throughput': 554.57}
[INFO|2026-04-21 03:46:34] logging.py:144 >> {'loss': 0.4441, 'learning_rate': 4.7971e-05, 'epoch': 0.39, 'throughput': 554.23}
[INFO|2026-04-21 03:52:31] logging.py:144 >> {'loss': 0.4354, 'learning_rate': 4.7936e-05, 'epoch': 0.40, 'throughput': 554.36}
[INFO|2026-04-21 03:58:46] logging.py:144 >> {'loss': 0.4524, 'learning_rate': 4.7902e-05, 'epoch': 0.40, 'throughput': 554.35}
[INFO|2026-04-21 04:05:07] logging.py:144 >> {'loss': 0.4325, 'learning_rate': 4.7867e-05, 'epoch': 0.40, 'throughput': 554.36}
[INFO|2026-04-21 04:11:05] logging.py:144 >> {'loss': 0.4352, 'learning_rate': 4.7832e-05, 'epoch': 0.41, 'throughput': 554.49}
[INFO|2026-04-21 04:17:35] logging.py:144 >> {'loss': 0.4528, 'learning_rate': 4.7796e-05, 'epoch': 0.41, 'throughput': 554.45}
[INFO|2026-04-21 04:23:59] logging.py:144 >> {'loss': 0.4297, 'learning_rate': 4.7760e-05, 'epoch': 0.41, 'throughput': 554.43}
[INFO|2026-04-21 04:29:38] logging.py:144 >> {'loss': 0.4437, 'learning_rate': 4.7724e-05, 'epoch': 0.42, 'throughput': 554.84}
[INFO|2026-04-21 04:35:20] logging.py:144 >> {'loss': 0.4540, 'learning_rate': 4.7688e-05, 'epoch': 0.42, 'throughput': 555.17}
[INFO|2026-04-21 04:41:49] logging.py:144 >> {'loss': 0.4468, 'learning_rate': 4.7652e-05, 'epoch': 0.42, 'throughput': 555.07}
[INFO|2026-04-21 04:48:01] logging.py:144 >> {'loss': 0.4414, 'learning_rate': 4.7615e-05, 'epoch': 0.43, 'throughput': 555.26}
[INFO|2026-04-21 04:53:57] logging.py:144 >> {'loss': 0.4526, 'learning_rate': 4.7578e-05, 'epoch': 0.43, 'throughput': 555.45}
[INFO|2026-04-21 05:00:35] logging.py:144 >> {'loss': 0.4472, 'learning_rate': 4.7540e-05, 'epoch': 0.43, 'throughput': 555.31}
[INFO|2026-04-21 05:07:24] logging.py:144 >> {'loss': 0.4473, 'learning_rate': 4.7503e-05, 'epoch': 0.44, 'throughput': 555.03}
[INFO|2026-04-21 05:13:25] logging.py:144 >> {'loss': 0.4431, 'learning_rate': 4.7465e-05, 'epoch': 0.44, 'throughput': 555.21}
[INFO|2026-04-21 05:19:37] logging.py:144 >> {'loss': 0.4419, 'learning_rate': 4.7427e-05, 'epoch': 0.44, 'throughput': 555.30}
[INFO|2026-04-21 05:25:44] logging.py:144 >> {'loss': 0.4380, 'learning_rate': 4.7388e-05, 'epoch': 0.44, 'throughput': 555.39}
[INFO|2026-04-21 05:31:28] logging.py:144 >> {'loss': 0.4596, 'learning_rate': 4.7349e-05, 'epoch': 0.45, 'throughput': 555.75}
[INFO|2026-04-21 05:37:31] logging.py:144 >> {'loss': 0.4443, 'learning_rate': 4.7310e-05, 'epoch': 0.45, 'throughput': 555.85}
[INFO|2026-04-21 05:43:35] logging.py:144 >> {'loss': 0.4355, 'learning_rate': 4.7271e-05, 'epoch': 0.45, 'throughput': 556.07}
[INFO|2026-04-21 05:49:28] logging.py:144 >> {'loss': 0.4473, 'learning_rate': 4.7232e-05, 'epoch': 0.46, 'throughput': 556.39}
[INFO|2026-04-21 05:55:57] logging.py:144 >> {'loss': 0.4348, 'learning_rate': 4.7192e-05, 'epoch': 0.46, 'throughput': 556.25}
[INFO|2026-04-21 06:02:43] logging.py:144 >> {'loss': 0.4415, 'learning_rate': 4.7152e-05, 'epoch': 0.46, 'throughput': 555.81}
[INFO|2026-04-21 06:08:40] logging.py:144 >> {'loss': 0.4362, 'learning_rate': 4.7112e-05, 'epoch': 0.47, 'throughput': 555.97}
[INFO|2026-04-21 06:14:57] logging.py:144 >> {'loss': 0.4472, 'learning_rate': 4.7071e-05, 'epoch': 0.47, 'throughput': 556.06}
[INFO|2026-04-21 06:21:16] logging.py:144 >> {'loss': 0.4574, 'learning_rate': 4.7030e-05, 'epoch': 0.47, 'throughput': 555.94}
[INFO|2026-04-21 06:27:20] logging.py:144 >> {'loss': 0.4447, 'learning_rate': 4.6989e-05, 'epoch': 0.48, 'throughput': 556.06}
[INFO|2026-04-21 06:33:34] logging.py:144 >> {'loss': 0.4435, 'learning_rate': 4.6948e-05, 'epoch': 0.48, 'throughput': 556.06}
[INFO|2026-04-21 06:39:33] logging.py:144 >> {'loss': 0.4252, 'learning_rate': 4.6906e-05, 'epoch': 0.48, 'throughput': 556.35}
[INFO|2026-04-21 06:45:31] logging.py:144 >> {'loss': 0.4439, 'learning_rate': 4.6864e-05, 'epoch': 0.49, 'throughput': 556.48}
[INFO|2026-04-21 06:52:00] logging.py:144 >> {'loss': 0.4436, 'learning_rate': 4.6822e-05, 'epoch': 0.49, 'throughput': 556.48}
[INFO|2026-04-21 06:58:10] logging.py:144 >> {'loss': 0.4343, 'learning_rate': 4.6779e-05, 'epoch': 0.49, 'throughput': 556.48}
[INFO|2026-04-21 07:04:04] logging.py:144 >> {'loss': 0.4306, 'learning_rate': 4.6737e-05, 'epoch': 0.50, 'throughput': 556.80}
[INFO|2026-04-21 07:09:59] logging.py:144 >> {'loss': 0.4516, 'learning_rate': 4.6694e-05, 'epoch': 0.50, 'throughput': 556.90}
[INFO|2026-04-21 07:15:19] logging.py:144 >> {'loss': 0.4434, 'learning_rate': 4.6651e-05, 'epoch': 0.50, 'throughput': 557.47}
[INFO|2026-04-21 07:21:21] logging.py:144 >> {'loss': 0.4294, 'learning_rate': 4.6607e-05, 'epoch': 0.51, 'throughput': 557.71}
[INFO|2026-04-21 07:26:56] logging.py:144 >> {'loss': 0.4354, 'learning_rate': 4.6563e-05, 'epoch': 0.51, 'throughput': 558.09}
[INFO|2026-04-21 07:32:16] logging.py:144 >> {'loss': 0.4376, 'learning_rate': 4.6519e-05, 'epoch': 0.51, 'throughput': 558.68}
[INFO|2026-04-21 07:37:28] logging.py:144 >> {'loss': 0.4368, 'learning_rate': 4.6475e-05, 'epoch': 0.52, 'throughput': 559.30}
[INFO|2026-04-21 07:42:49] logging.py:144 >> {'loss': 0.4263, 'learning_rate': 4.6431e-05, 'epoch': 0.52, 'throughput': 559.85}
[INFO|2026-04-21 07:48:04] logging.py:144 >> {'loss': 0.4542, 'learning_rate': 4.6386e-05, 'epoch': 0.52, 'throughput': 560.37}
[INFO|2026-04-21 07:53:38] logging.py:144 >> {'loss': 0.4373, 'learning_rate': 4.6341e-05, 'epoch': 0.53, 'throughput': 560.75}
[INFO|2026-04-21 07:58:42] logging.py:144 >> {'loss': 0.4468, 'learning_rate': 4.6296e-05, 'epoch': 0.53, 'throughput': 561.46}
[INFO|2026-04-21 08:04:02] logging.py:144 >> {'loss': 0.4501, 'learning_rate': 4.6250e-05, 'epoch': 0.53, 'throughput': 561.95}
[INFO|2026-04-21 08:09:10] logging.py:144 >> {'loss': 0.4332, 'learning_rate': 4.6204e-05, 'epoch': 0.54, 'throughput': 562.41}
[INFO|2026-04-21 08:14:14] logging.py:144 >> {'loss': 0.4437, 'learning_rate': 4.6158e-05, 'epoch': 0.54, 'throughput': 563.10}
[INFO|2026-04-21 08:19:01] logging.py:144 >> {'loss': 0.4398, 'learning_rate': 4.6112e-05, 'epoch': 0.54, 'throughput': 563.86}
[INFO|2026-04-21 08:24:33] logging.py:144 >> {'loss': 0.4515, 'learning_rate': 4.6065e-05, 'epoch': 0.55, 'throughput': 564.26}
[INFO|2026-04-21 08:29:45] logging.py:144 >> {'loss': 0.4467, 'learning_rate': 4.6019e-05, 'epoch': 0.55, 'throughput': 564.73}
[INFO|2026-04-21 08:35:18] logging.py:144 >> {'loss': 0.4418, 'learning_rate': 4.5971e-05, 'epoch': 0.55, 'throughput': 565.09}
[INFO|2026-04-21 08:40:28] logging.py:144 >> {'loss': 0.4422, 'learning_rate': 4.5924e-05, 'epoch': 0.56, 'throughput': 565.57}
[INFO|2026-04-21 08:45:30] logging.py:144 >> {'loss': 0.4384, 'learning_rate': 4.5877e-05, 'epoch': 0.56, 'throughput': 566.27}
[INFO|2026-04-21 08:50:41] logging.py:144 >> {'loss': 0.4343, 'learning_rate': 4.5829e-05, 'epoch': 0.56, 'throughput': 566.78}
[INFO|2026-04-21 08:55:35] logging.py:144 >> {'loss': 0.4417, 'learning_rate': 4.5781e-05, 'epoch': 0.57, 'throughput': 567.36}
[INFO|2026-04-21 09:00:58] logging.py:144 >> {'loss': 0.4214, 'learning_rate': 4.5732e-05, 'epoch': 0.57, 'throughput': 567.80}
[INFO|2026-04-21 09:06:08] logging.py:144 >> {'loss': 0.4365, 'learning_rate': 4.5684e-05, 'epoch': 0.57, 'throughput': 568.38}
[INFO|2026-04-21 09:11:49] logging.py:144 >> {'loss': 0.4447, 'learning_rate': 4.5635e-05, 'epoch': 0.58, 'throughput': 568.60}
[INFO|2026-04-21 09:17:49] logging.py:144 >> {'loss': 0.4361, 'learning_rate': 4.5586e-05, 'epoch': 0.58, 'throughput': 568.72}
[INFO|2026-04-21 09:23:28] logging.py:144 >> {'loss': 0.4435, 'learning_rate': 4.5537e-05, 'epoch': 0.58, 'throughput': 568.98}
[INFO|2026-04-21 09:29:05] logging.py:144 >> {'loss': 0.4392, 'learning_rate': 4.5487e-05, 'epoch': 0.59, 'throughput': 569.23}
[INFO|2026-04-21 09:34:46] logging.py:144 >> {'loss': 0.4473, 'learning_rate': 4.5437e-05, 'epoch': 0.59, 'throughput': 569.50}
[INFO|2026-04-21 09:40:37] logging.py:144 >> {'loss': 0.4429, 'learning_rate': 4.5387e-05, 'epoch': 0.59, 'throughput': 569.77}
[INFO|2026-04-21 09:46:23] logging.py:144 >> {'loss': 0.4382, 'learning_rate': 4.5337e-05, 'epoch': 0.60, 'throughput': 570.07}
[INFO|2026-04-21 09:51:47] logging.py:144 >> {'loss': 0.4445, 'learning_rate': 4.5286e-05, 'epoch': 0.60, 'throughput': 570.36}
[INFO|2026-04-21 09:57:25] logging.py:144 >> {'loss': 0.4294, 'learning_rate': 4.5236e-05, 'epoch': 0.60, 'throughput': 570.56}
[INFO|2026-04-21 10:03:15] logging.py:144 >> {'loss': 0.4395, 'learning_rate': 4.5185e-05, 'epoch': 0.61, 'throughput': 570.73}
[INFO|2026-04-21 10:08:38] logging.py:144 >> {'loss': 0.4484, 'learning_rate': 4.5133e-05, 'epoch': 0.61, 'throughput': 571.06}
[INFO|2026-04-21 10:13:29] logging.py:144 >> {'loss': 0.4322, 'learning_rate': 4.5082e-05, 'epoch': 0.61, 'throughput': 571.65}
[INFO|2026-04-21 10:18:27] logging.py:144 >> {'loss': 0.4433, 'learning_rate': 4.5030e-05, 'epoch': 0.62, 'throughput': 572.09}
[INFO|2026-04-21 10:23:20] logging.py:144 >> {'loss': 0.4428, 'learning_rate': 4.4978e-05, 'epoch': 0.62, 'throughput': 572.66}
[INFO|2026-04-21 10:28:27] logging.py:144 >> {'loss': 0.4331, 'learning_rate': 4.4926e-05, 'epoch': 0.62, 'throughput': 573.11}
[INFO|2026-04-21 10:33:57] logging.py:144 >> {'loss': 0.4346, 'learning_rate': 4.4873e-05, 'epoch': 0.63, 'throughput': 573.42}
[INFO|2026-04-21 10:39:18] logging.py:144 >> {'loss': 0.4366, 'learning_rate': 4.4821e-05, 'epoch': 0.63, 'throughput': 573.80}
[INFO|2026-04-21 10:44:33] logging.py:144 >> {'loss': 0.4359, 'learning_rate': 4.4768e-05, 'epoch': 0.63, 'throughput': 574.28}
[INFO|2026-04-21 10:49:45] logging.py:144 >> {'loss': 0.4304, 'learning_rate': 4.4714e-05, 'epoch': 0.64, 'throughput': 574.73}
[INFO|2026-04-21 10:54:28] logging.py:144 >> {'loss': 0.4471, 'learning_rate': 4.4661e-05, 'epoch': 0.64, 'throughput': 575.28}
[INFO|2026-04-21 10:59:14] logging.py:144 >> {'loss': 0.4176, 'learning_rate': 4.4607e-05, 'epoch': 0.64, 'throughput': 575.91}
[INFO|2026-04-21 11:03:45] logging.py:144 >> {'loss': 0.4505, 'learning_rate': 4.4554e-05, 'epoch': 0.65, 'throughput': 576.53}
[INFO|2026-04-21 11:08:37] logging.py:144 >> {'loss': 0.4302, 'learning_rate': 4.4499e-05, 'epoch': 0.65, 'throughput': 577.01}
[INFO|2026-04-21 11:13:33] logging.py:144 >> {'loss': 0.4324, 'learning_rate': 4.4445e-05, 'epoch': 0.65, 'throughput': 577.58}
[INFO|2026-04-21 11:18:21] logging.py:144 >> {'loss': 0.4238, 'learning_rate': 4.4390e-05, 'epoch': 0.66, 'throughput': 578.25}
[INFO|2026-04-21 11:23:17] logging.py:144 >> {'loss': 0.4452, 'learning_rate': 4.4336e-05, 'epoch': 0.66, 'throughput': 578.82}
[INFO|2026-04-21 11:28:06] logging.py:144 >> {'loss': 0.4383, 'learning_rate': 4.4281e-05, 'epoch': 0.66, 'throughput': 579.45}
[INFO|2026-04-21 11:32:54] logging.py:144 >> {'loss': 0.4315, 'learning_rate': 4.4225e-05, 'epoch': 0.67, 'throughput': 580.03}
[INFO|2026-04-21 11:37:43] logging.py:144 >> {'loss': 0.4368, 'learning_rate': 4.4170e-05, 'epoch': 0.67, 'throughput': 580.65}
[INFO|2026-04-21 11:43:15] logging.py:144 >> {'loss': 0.4510, 'learning_rate': 4.4114e-05, 'epoch': 0.67, 'throughput': 580.89}
[INFO|2026-04-21 11:48:38] logging.py:144 >> {'loss': 0.4389, 'learning_rate': 4.4058e-05, 'epoch': 0.68, 'throughput': 581.28}
[INFO|2026-04-21 11:53:44] logging.py:144 >> {'loss': 0.4312, 'learning_rate': 4.4002e-05, 'epoch': 0.68, 'throughput': 581.55}
[INFO|2026-04-21 11:58:16] logging.py:144 >> {'loss': 0.4344, 'learning_rate': 4.3945e-05, 'epoch': 0.68, 'throughput': 582.24}
[INFO|2026-04-21 12:03:39] logging.py:144 >> {'loss': 0.4594, 'learning_rate': 4.3889e-05, 'epoch': 0.69, 'throughput': 582.58}
[INFO|2026-04-21 12:08:38] logging.py:144 >> {'loss': 0.4236, 'learning_rate': 4.3832e-05, 'epoch': 0.69, 'throughput': 583.12}
[INFO|2026-04-21 12:13:54] logging.py:144 >> {'loss': 0.4276, 'learning_rate': 4.3775e-05, 'epoch': 0.69, 'throughput': 583.52}
[INFO|2026-04-21 12:18:58] logging.py:144 >> {'loss': 0.4342, 'learning_rate': 4.3717e-05, 'epoch': 0.70, 'throughput': 583.92}
[INFO|2026-04-21 12:23:53] logging.py:144 >> {'loss': 0.4416, 'learning_rate': 4.3660e-05, 'epoch': 0.70, 'throughput': 584.43}
[INFO|2026-04-21 12:29:04] logging.py:144 >> {'loss': 0.4488, 'learning_rate': 4.3602e-05, 'epoch': 0.70, 'throughput': 584.83}
[INFO|2026-04-21 12:34:48] logging.py:144 >> {'loss': 0.4378, 'learning_rate': 4.3544e-05, 'epoch': 0.71, 'throughput': 584.99}
[INFO|2026-04-21 12:40:34] logging.py:144 >> {'loss': 0.4362, 'learning_rate': 4.3486e-05, 'epoch': 0.71, 'throughput': 585.04}
[INFO|2026-04-21 12:45:55] logging.py:144 >> {'loss': 0.4446, 'learning_rate': 4.3427e-05, 'epoch': 0.71, 'throughput': 585.35}
[INFO|2026-04-21 12:51:04] logging.py:144 >> {'loss': 0.4410, 'learning_rate': 4.3368e-05, 'epoch': 0.72, 'throughput': 585.67}
[INFO|2026-04-21 12:56:41] logging.py:144 >> {'loss': 0.4256, 'learning_rate': 4.3309e-05, 'epoch': 0.72, 'throughput': 585.80}
[INFO|2026-04-21 13:01:59] logging.py:144 >> {'loss': 0.4468, 'learning_rate': 4.3250e-05, 'epoch': 0.72, 'throughput': 586.13}
[INFO|2026-04-21 13:07:12] logging.py:144 >> {'loss': 0.4363, 'learning_rate': 4.3191e-05, 'epoch': 0.73, 'throughput': 586.48}
[INFO|2026-04-21 13:12:23] logging.py:144 >> {'loss': 0.4384, 'learning_rate': 4.3131e-05, 'epoch': 0.73, 'throughput': 586.77}
[INFO|2026-04-21 13:17:35] logging.py:144 >> {'loss': 0.4106, 'learning_rate': 4.3072e-05, 'epoch': 0.73, 'throughput': 587.16}
[INFO|2026-04-21 13:23:07] logging.py:144 >> {'loss': 0.4428, 'learning_rate': 4.3012e-05, 'epoch': 0.74, 'throughput': 587.32}
[INFO|2026-04-21 13:28:08] logging.py:144 >> {'loss': 0.4438, 'learning_rate': 4.2951e-05, 'epoch': 0.74, 'throughput': 587.65}
[INFO|2026-04-21 13:33:25] logging.py:144 >> {'loss': 0.4322, 'learning_rate': 4.2891e-05, 'epoch': 0.74, 'throughput': 587.99}
[INFO|2026-04-21 13:38:43] logging.py:144 >> {'loss': 0.4373, 'learning_rate': 4.2830e-05, 'epoch': 0.75, 'throughput': 588.31}
[INFO|2026-04-21 13:44:07] logging.py:144 >> {'loss': 0.4358, 'learning_rate': 4.2769e-05, 'epoch': 0.75, 'throughput': 588.59}
[INFO|2026-04-21 13:49:25] logging.py:144 >> {'loss': 0.4268, 'learning_rate': 4.2708e-05, 'epoch': 0.75, 'throughput': 588.90}
[INFO|2026-04-21 13:55:06] logging.py:144 >> {'loss': 0.4380, 'learning_rate': 4.2647e-05, 'epoch': 0.76, 'throughput': 589.00}
[INFO|2026-04-21 14:00:15] logging.py:144 >> {'loss': 0.4525, 'learning_rate': 4.2585e-05, 'epoch': 0.76, 'throughput': 589.33}
[INFO|2026-04-21 14:05:44] logging.py:144 >> {'loss': 0.4329, 'learning_rate': 4.2524e-05, 'epoch': 0.76, 'throughput': 589.63}
[INFO|2026-04-21 14:11:07] logging.py:144 >> {'loss': 0.4324, 'learning_rate': 4.2462e-05, 'epoch': 0.77, 'throughput': 589.82}
[INFO|2026-04-21 14:16:37] logging.py:144 >> {'loss': 0.4258, 'learning_rate': 4.2400e-05, 'epoch': 0.77, 'throughput': 589.99}
[INFO|2026-04-21 14:22:11] logging.py:144 >> {'loss': 0.4328, 'learning_rate': 4.2337e-05, 'epoch': 0.77, 'throughput': 590.13}
[INFO|2026-04-21 14:27:08] logging.py:144 >> {'loss': 0.4234, 'learning_rate': 4.2275e-05, 'epoch': 0.78, 'throughput': 590.54}
[INFO|2026-04-21 14:32:12] logging.py:144 >> {'loss': 0.4280, 'learning_rate': 4.2212e-05, 'epoch': 0.78, 'throughput': 590.89}
[INFO|2026-04-21 14:37:43] logging.py:144 >> {'loss': 0.4165, 'learning_rate': 4.2149e-05, 'epoch': 0.78, 'throughput': 591.02}
[INFO|2026-04-21 14:43:14] logging.py:144 >> {'loss': 0.4364, 'learning_rate': 4.2086e-05, 'epoch': 0.79, 'throughput': 591.24}
[INFO|2026-04-21 14:49:18] logging.py:144 >> {'loss': 0.4244, 'learning_rate': 4.2022e-05, 'epoch': 0.79, 'throughput': 591.19}
[INFO|2026-04-21 14:55:06] logging.py:144 >> {'loss': 0.4166, 'learning_rate': 4.1959e-05, 'epoch': 0.79, 'throughput': 591.28}
[INFO|2026-04-21 15:00:31] logging.py:144 >> {'loss': 0.4220, 'learning_rate': 4.1895e-05, 'epoch': 0.80, 'throughput': 591.51}
[INFO|2026-04-21 15:06:31] logging.py:144 >> {'loss': 0.4357, 'learning_rate': 4.1831e-05, 'epoch': 0.80, 'throughput': 591.57}
[INFO|2026-04-21 15:12:23] logging.py:144 >> {'loss': 0.4283, 'learning_rate': 4.1767e-05, 'epoch': 0.80, 'throughput': 591.57}
[INFO|2026-04-21 15:17:38] logging.py:144 >> {'loss': 0.4342, 'learning_rate': 4.1702e-05, 'epoch': 0.81, 'throughput': 591.84}
[INFO|2026-04-21 15:23:16] logging.py:144 >> {'loss': 0.4343, 'learning_rate': 4.1638e-05, 'epoch': 0.81, 'throughput': 591.92}
[INFO|2026-04-21 15:29:40] logging.py:144 >> {'loss': 0.4251, 'learning_rate': 4.1573e-05, 'epoch': 0.81, 'throughput': 591.76}
[INFO|2026-04-21 15:35:35] logging.py:144 >> {'loss': 0.4270, 'learning_rate': 4.1508e-05, 'epoch': 0.82, 'throughput': 591.82}
[INFO|2026-04-21 15:42:09] logging.py:144 >> {'loss': 0.4357, 'learning_rate': 4.1443e-05, 'epoch': 0.82, 'throughput': 591.57}
[INFO|2026-04-21 15:47:42] logging.py:144 >> {'loss': 0.4357, 'learning_rate': 4.1377e-05, 'epoch': 0.82, 'throughput': 591.67}
[INFO|2026-04-21 15:53:55] logging.py:144 >> {'loss': 0.4438, 'learning_rate': 4.1312e-05, 'epoch': 0.83, 'throughput': 591.58}
[INFO|2026-04-21 16:00:00] logging.py:144 >> {'loss': 0.4339, 'learning_rate': 4.1246e-05, 'epoch': 0.83, 'throughput': 591.50}
[INFO|2026-04-21 16:05:02] logging.py:144 >> {'loss': 0.4290, 'learning_rate': 4.1180e-05, 'epoch': 0.83, 'throughput': 591.84}
[INFO|2026-04-21 16:10:32] logging.py:144 >> {'loss': 0.4255, 'learning_rate': 4.1114e-05, 'epoch': 0.84, 'throughput': 592.03}
[INFO|2026-04-21 16:15:52] logging.py:144 >> {'loss': 0.4403, 'learning_rate': 4.1048e-05, 'epoch': 0.84, 'throughput': 592.26}
[INFO|2026-04-21 16:21:27] logging.py:144 >> {'loss': 0.4338, 'learning_rate': 4.0981e-05, 'epoch': 0.84, 'throughput': 592.34}
[INFO|2026-04-21 16:27:23] logging.py:144 >> {'loss': 0.4275, 'learning_rate': 4.0914e-05, 'epoch': 0.85, 'throughput': 592.38}
[INFO|2026-04-21 16:33:36] logging.py:144 >> {'loss': 0.4287, 'learning_rate': 4.0847e-05, 'epoch': 0.85, 'throughput': 592.30}
[INFO|2026-04-21 16:39:12] logging.py:144 >> {'loss': 0.4217, 'learning_rate': 4.0780e-05, 'epoch': 0.85, 'throughput': 592.42}
[INFO|2026-04-21 16:44:44] logging.py:144 >> {'loss': 0.4260, 'learning_rate': 4.0713e-05, 'epoch': 0.86, 'throughput': 592.60}
[INFO|2026-04-21 16:50:14] logging.py:144 >> {'loss': 0.4313, 'learning_rate': 4.0645e-05, 'epoch': 0.86, 'throughput': 592.72}
[INFO|2026-04-21 16:56:39] logging.py:144 >> {'loss': 0.4205, 'learning_rate': 4.0578e-05, 'epoch': 0.86, 'throughput': 592.51}
[INFO|2026-04-21 17:02:36] logging.py:144 >> {'loss': 0.4207, 'learning_rate': 4.0510e-05, 'epoch': 0.87, 'throughput': 592.49}
[INFO|2026-04-21 17:09:14] logging.py:144 >> {'loss': 0.4227, 'learning_rate': 4.0442e-05, 'epoch': 0.87, 'throughput': 592.26}
[INFO|2026-04-21 17:15:01] logging.py:144 >> {'loss': 0.4411, 'learning_rate': 4.0373e-05, 'epoch': 0.87, 'throughput': 592.31}
[INFO|2026-04-21 17:21:01] logging.py:144 >> {'loss': 0.4170, 'learning_rate': 4.0305e-05, 'epoch': 0.88, 'throughput': 592.35}
[INFO|2026-04-21 17:27:01] logging.py:144 >> {'loss': 0.4379, 'learning_rate': 4.0236e-05, 'epoch': 0.88, 'throughput': 592.31}
[INFO|2026-04-21 17:32:44] logging.py:144 >> {'loss': 0.4235, 'learning_rate': 4.0167e-05, 'epoch': 0.88, 'throughput': 592.38}
[INFO|2026-04-21 17:38:31] logging.py:144 >> {'loss': 0.4373, 'learning_rate': 4.0098e-05, 'epoch': 0.89, 'throughput': 592.39}
[INFO|2026-04-21 17:44:13] logging.py:144 >> {'loss': 0.4228, 'learning_rate': 4.0029e-05, 'epoch': 0.89, 'throughput': 592.51}
[INFO|2026-04-21 17:49:56] logging.py:144 >> {'loss': 0.4389, 'learning_rate': 3.9960e-05, 'epoch': 0.89, 'throughput': 592.57}
[INFO|2026-04-21 17:55:40] logging.py:144 >> {'loss': 0.4256, 'learning_rate': 3.9890e-05, 'epoch': 0.90, 'throughput': 592.65}
[INFO|2026-04-21 18:01:16] logging.py:144 >> {'loss': 0.4395, 'learning_rate': 3.9821e-05, 'epoch': 0.90, 'throughput': 592.73}
[INFO|2026-04-21 18:07:05] logging.py:144 >> {'loss': 0.4311, 'learning_rate': 3.9751e-05, 'epoch': 0.90, 'throughput': 592.70}
[INFO|2026-04-21 18:12:42] logging.py:144 >> {'loss': 0.4224, 'learning_rate': 3.9681e-05, 'epoch': 0.91, 'throughput': 592.76}
[INFO|2026-04-21 18:18:27] logging.py:144 >> {'loss': 0.4272, 'learning_rate': 3.9610e-05, 'epoch': 0.91, 'throughput': 592.76}
[INFO|2026-04-21 18:23:39] logging.py:144 >> {'loss': 0.4270, 'learning_rate': 3.9540e-05, 'epoch': 0.91, 'throughput': 592.96}
[INFO|2026-04-21 18:29:43] logging.py:144 >> {'loss': 0.4242, 'learning_rate': 3.9469e-05, 'epoch': 0.92, 'throughput': 592.95}
[INFO|2026-04-21 18:36:21] logging.py:144 >> {'loss': 0.4298, 'learning_rate': 3.9399e-05, 'epoch': 0.92, 'throughput': 592.65}
[INFO|2026-04-21 18:41:56] logging.py:144 >> {'loss': 0.4302, 'learning_rate': 3.9328e-05, 'epoch': 0.92, 'throughput': 592.74}
[INFO|2026-04-21 18:47:42] logging.py:144 >> {'loss': 0.4215, 'learning_rate': 3.9256e-05, 'epoch': 0.93, 'throughput': 592.80}
[INFO|2026-04-21 18:53:26] logging.py:144 >> {'loss': 0.4312, 'learning_rate': 3.9185e-05, 'epoch': 0.93, 'throughput': 592.80}
[INFO|2026-04-21 18:59:01] logging.py:144 >> {'loss': 0.4252, 'learning_rate': 3.9114e-05, 'epoch': 0.93, 'throughput': 592.93}
[INFO|2026-04-21 19:04:32] logging.py:144 >> {'loss': 0.4389, 'learning_rate': 3.9042e-05, 'epoch': 0.94, 'throughput': 593.07}
[INFO|2026-04-21 19:10:01] logging.py:144 >> {'loss': 0.4274, 'learning_rate': 3.8970e-05, 'epoch': 0.94, 'throughput': 593.19}
[INFO|2026-04-21 19:15:19] logging.py:144 >> {'loss': 0.4174, 'learning_rate': 3.8898e-05, 'epoch': 0.94, 'throughput': 593.47}
[INFO|2026-04-21 19:20:53] logging.py:144 >> {'loss': 0.4251, 'learning_rate': 3.8826e-05, 'epoch': 0.95, 'throughput': 593.62}
[INFO|2026-04-21 19:26:15] logging.py:144 >> {'loss': 0.4294, 'learning_rate': 3.8754e-05, 'epoch': 0.95, 'throughput': 593.83}
[INFO|2026-04-21 19:31:52] logging.py:144 >> {'loss': 0.4251, 'learning_rate': 3.8681e-05, 'epoch': 0.95, 'throughput': 593.94}
[INFO|2026-04-21 19:37:40] logging.py:144 >> {'loss': 0.4265, 'learning_rate': 3.8609e-05, 'epoch': 0.96, 'throughput': 593.98}
[INFO|2026-04-21 19:43:38] logging.py:144 >> {'loss': 0.4325, 'learning_rate': 3.8536e-05, 'epoch': 0.96, 'throughput': 593.91}
[INFO|2026-04-21 19:49:08] logging.py:144 >> {'loss': 0.4243, 'learning_rate': 3.8463e-05, 'epoch': 0.96, 'throughput': 593.99}
[INFO|2026-04-21 19:54:50] logging.py:144 >> {'loss': 0.4226, 'learning_rate': 3.8390e-05, 'epoch': 0.97, 'throughput': 594.04}
[INFO|2026-04-21 20:00:43] logging.py:144 >> {'loss': 0.4229, 'learning_rate': 3.8317e-05, 'epoch': 0.97, 'throughput': 594.01}
[INFO|2026-04-21 20:06:18] logging.py:144 >> {'loss': 0.4292, 'learning_rate': 3.8243e-05, 'epoch': 0.97, 'throughput': 594.12}
[INFO|2026-04-21 20:11:37] logging.py:144 >> {'loss': 0.4378, 'learning_rate': 3.8169e-05, 'epoch': 0.98, 'throughput': 594.33}
[INFO|2026-04-21 20:17:03] logging.py:144 >> {'loss': 0.4230, 'learning_rate': 3.8096e-05, 'epoch': 0.98, 'throughput': 594.50}
[INFO|2026-04-21 20:22:53] logging.py:144 >> {'loss': 0.4289, 'learning_rate': 3.8022e-05, 'epoch': 0.98, 'throughput': 594.57}
[INFO|2026-04-21 20:28:58] logging.py:144 >> {'loss': 0.4204, 'learning_rate': 3.7948e-05, 'epoch': 0.99, 'throughput': 594.49}
[INFO|2026-04-21 20:34:08] logging.py:144 >> {'loss': 0.4255, 'learning_rate': 3.7873e-05, 'epoch': 0.99, 'throughput': 594.71}
[INFO|2026-04-21 20:39:30] logging.py:144 >> {'loss': 0.4191, 'learning_rate': 3.7799e-05, 'epoch': 0.99, 'throughput': 594.86}
[INFO|2026-04-21 20:44:46] logging.py:144 >> {'loss': 0.4248, 'learning_rate': 3.7725e-05, 'epoch': 1.00, 'throughput': 595.11}
[INFO|2026-04-21 20:50:46] logging.py:144 >> {'loss': 0.4201, 'learning_rate': 3.7650e-05, 'epoch': 1.00, 'throughput': 595.09}
[INFO|2026-04-21 20:51:26] logging.py:144 >> {'loss': 0.3795, 'learning_rate': 3.7575e-05, 'epoch': 1.00, 'throughput': 595.12}
[INFO|2026-04-21 20:56:58] logging.py:144 >> {'loss': 0.3040, 'learning_rate': 3.7500e-05, 'epoch': 1.00, 'throughput': 595.21}
[INFO|2026-04-21 21:02:21] logging.py:144 >> {'loss': 0.2919, 'learning_rate': 3.7425e-05, 'epoch': 1.01, 'throughput': 595.35}
[INFO|2026-04-21 21:07:31] logging.py:144 >> {'loss': 0.2877, 'learning_rate': 3.7350e-05, 'epoch': 1.01, 'throughput': 595.65}
[INFO|2026-04-21 21:12:52] logging.py:144 >> {'loss': 0.2927, 'learning_rate': 3.7274e-05, 'epoch': 1.01, 'throughput': 595.84}
[INFO|2026-04-21 21:18:15] logging.py:144 >> {'loss': 0.2891, 'learning_rate': 3.7199e-05, 'epoch': 1.02, 'throughput': 595.89}
[INFO|2026-04-21 21:23:29] logging.py:144 >> {'loss': 0.2905, 'learning_rate': 3.7123e-05, 'epoch': 1.02, 'throughput': 596.11}
[INFO|2026-04-21 21:29:28] logging.py:144 >> {'loss': 0.2945, 'learning_rate': 3.7047e-05, 'epoch': 1.02, 'throughput': 596.09}
[INFO|2026-04-21 21:35:21] logging.py:144 >> {'loss': 0.2926, 'learning_rate': 3.6971e-05, 'epoch': 1.03, 'throughput': 596.07}
[INFO|2026-04-21 21:41:14] logging.py:144 >> {'loss': 0.2882, 'learning_rate': 3.6895e-05, 'epoch': 1.03, 'throughput': 596.07}
[INFO|2026-04-21 21:46:52] logging.py:144 >> {'loss': 0.2864, 'learning_rate': 3.6818e-05, 'epoch': 1.03, 'throughput': 596.20}
[INFO|2026-04-21 21:52:39] logging.py:144 >> {'loss': 0.2747, 'learning_rate': 3.6742e-05, 'epoch': 1.04, 'throughput': 596.19}
[INFO|2026-04-21 21:58:21] logging.py:144 >> {'loss': 0.2784, 'learning_rate': 3.6665e-05, 'epoch': 1.04, 'throughput': 596.24}
[INFO|2026-04-21 22:03:59] logging.py:144 >> {'loss': 0.2700, 'learning_rate': 3.6589e-05, 'epoch': 1.04, 'throughput': 596.33}
[INFO|2026-04-21 22:09:38] logging.py:144 >> {'loss': 0.2937, 'learning_rate': 3.6512e-05, 'epoch': 1.05, 'throughput': 596.47}
[INFO|2026-04-21 22:15:02] logging.py:144 >> {'loss': 0.2836, 'learning_rate': 3.6435e-05, 'epoch': 1.05, 'throughput': 596.56}
[INFO|2026-04-21 22:20:21] logging.py:144 >> {'loss': 0.2832, 'learning_rate': 3.6357e-05, 'epoch': 1.05, 'throughput': 596.72}
[INFO|2026-04-21 22:25:51] logging.py:144 >> {'loss': 0.2754, 'learning_rate': 3.6280e-05, 'epoch': 1.06, 'throughput': 596.80}
[INFO|2026-04-21 22:31:27] logging.py:144 >> {'loss': 0.2778, 'learning_rate': 3.6203e-05, 'epoch': 1.06, 'throughput': 596.97}
[INFO|2026-04-21 22:36:55] logging.py:144 >> {'loss': 0.2807, 'learning_rate': 3.6125e-05, 'epoch': 1.06, 'throughput': 597.07}
[INFO|2026-04-21 22:42:33] logging.py:144 >> {'loss': 0.2865, 'learning_rate': 3.6047e-05, 'epoch': 1.07, 'throughput': 597.10}
[INFO|2026-04-21 22:48:01] logging.py:144 >> {'loss': 0.2833, 'learning_rate': 3.5970e-05, 'epoch': 1.07, 'throughput': 597.22}
[INFO|2026-04-21 22:53:23] logging.py:144 >> {'loss': 0.2791, 'learning_rate': 3.5892e-05, 'epoch': 1.07, 'throughput': 597.32}
[INFO|2026-04-21 22:58:42] logging.py:144 >> {'loss': 0.2753, 'learning_rate': 3.5814e-05, 'epoch': 1.08, 'throughput': 597.48}
[INFO|2026-04-21 23:04:08] logging.py:144 >> {'loss': 0.2673, 'learning_rate': 3.5735e-05, 'epoch': 1.08, 'throughput': 597.59}
[INFO|2026-04-21 23:09:36] logging.py:144 >> {'loss': 0.2821, 'learning_rate': 3.5657e-05, 'epoch': 1.08, 'throughput': 597.72}
[INFO|2026-04-21 23:15:08] logging.py:144 >> {'loss': 0.2856, 'learning_rate': 3.5579e-05, 'epoch': 1.09, 'throughput': 597.83}
[INFO|2026-04-21 23:20:30] logging.py:144 >> {'loss': 0.2782, 'learning_rate': 3.5500e-05, 'epoch': 1.09, 'throughput': 597.99}
[INFO|2026-04-21 23:26:17] logging.py:144 >> {'loss': 0.2770, 'learning_rate': 3.5421e-05, 'epoch': 1.09, 'throughput': 598.01}
[INFO|2026-04-21 23:31:52] logging.py:144 >> {'loss': 0.2687, 'learning_rate': 3.5342e-05, 'epoch': 1.10, 'throughput': 598.09}
[INFO|2026-04-21 23:36:57] logging.py:144 >> {'loss': 0.2748, 'learning_rate': 3.5263e-05, 'epoch': 1.10, 'throughput': 598.39}
[INFO|2026-04-21 23:42:10] logging.py:144 >> {'loss': 0.2827, 'learning_rate': 3.5184e-05, 'epoch': 1.10, 'throughput': 598.62}
[INFO|2026-04-21 23:47:39] logging.py:144 >> {'loss': 0.2690, 'learning_rate': 3.5105e-05, 'epoch': 1.11, 'throughput': 598.71}
[INFO|2026-04-21 23:52:47] logging.py:144 >> {'loss': 0.2743, 'learning_rate': 3.5026e-05, 'epoch': 1.11, 'throughput': 598.96}
[INFO|2026-04-21 23:58:09] logging.py:144 >> {'loss': 0.2685, 'learning_rate': 3.4946e-05, 'epoch': 1.11, 'throughput': 599.06}
[INFO|2026-04-22 00:03:46] logging.py:144 >> {'loss': 0.2690, 'learning_rate': 3.4867e-05, 'epoch': 1.12, 'throughput': 599.07}
[INFO|2026-04-22 00:09:34] logging.py:144 >> {'loss': 0.2700, 'learning_rate': 3.4787e-05, 'epoch': 1.12, 'throughput': 599.07}
[INFO|2026-04-22 00:15:07] logging.py:144 >> {'loss': 0.2767, 'learning_rate': 3.4707e-05, 'epoch': 1.12, 'throughput': 599.14}
[INFO|2026-04-22 00:21:16] logging.py:144 >> {'loss': 0.2739, 'learning_rate': 3.4627e-05, 'epoch': 1.13, 'throughput': 599.05}
[INFO|2026-04-22 00:26:55] logging.py:144 >> {'loss': 0.2712, 'learning_rate': 3.4547e-05, 'epoch': 1.13, 'throughput': 599.08}
[INFO|2026-04-22 00:32:41] logging.py:144 >> {'loss': 0.2786, 'learning_rate': 3.4467e-05, 'epoch': 1.13, 'throughput': 599.10}
[INFO|2026-04-22 00:38:29] logging.py:144 >> {'loss': 0.2722, 'learning_rate': 3.4387e-05, 'epoch': 1.14, 'throughput': 599.12}
[INFO|2026-04-22 00:44:05] logging.py:144 >> {'loss': 0.2708, 'learning_rate': 3.4306e-05, 'epoch': 1.14, 'throughput': 599.19}
[INFO|2026-04-22 00:49:41] logging.py:144 >> {'loss': 0.2787, 'learning_rate': 3.4226e-05, 'epoch': 1.14, 'throughput': 599.27}
[INFO|2026-04-22 00:55:25] logging.py:144 >> {'loss': 0.2796, 'learning_rate': 3.4145e-05, 'epoch': 1.15, 'throughput': 599.25}
[INFO|2026-04-22 01:00:42] logging.py:144 >> {'loss': 0.2782, 'learning_rate': 3.4064e-05, 'epoch': 1.15, 'throughput': 599.49}
[INFO|2026-04-22 01:06:02] logging.py:144 >> {'loss': 0.2750, 'learning_rate': 3.3983e-05, 'epoch': 1.15, 'throughput': 599.61}
[INFO|2026-04-22 01:11:24] logging.py:144 >> {'loss': 0.2625, 'learning_rate': 3.3903e-05, 'epoch': 1.16, 'throughput': 599.73}
[INFO|2026-04-22 01:17:03] logging.py:144 >> {'loss': 0.2764, 'learning_rate': 3.3821e-05, 'epoch': 1.16, 'throughput': 599.80}
[INFO|2026-04-22 01:22:50] logging.py:144 >> {'loss': 0.2801, 'learning_rate': 3.3740e-05, 'epoch': 1.16, 'throughput': 599.82}
[INFO|2026-04-22 01:28:28] logging.py:144 >> {'loss': 0.2800, 'learning_rate': 3.3659e-05, 'epoch': 1.17, 'throughput': 599.86}
[INFO|2026-04-22 01:33:56] logging.py:144 >> {'loss': 0.2793, 'learning_rate': 3.3578e-05, 'epoch': 1.17, 'throughput': 599.99}
[INFO|2026-04-22 01:38:47] logging.py:144 >> {'loss': 0.2807, 'learning_rate': 3.3496e-05, 'epoch': 1.17, 'throughput': 600.29}
[INFO|2026-04-22 01:43:51] logging.py:144 >> {'loss': 0.2770, 'learning_rate': 3.3415e-05, 'epoch': 1.18, 'throughput': 600.48}
[INFO|2026-04-22 01:49:04] logging.py:144 >> {'loss': 0.2762, 'learning_rate': 3.3333e-05, 'epoch': 1.18, 'throughput': 600.70}
[INFO|2026-04-22 01:53:53] logging.py:144 >> {'loss': 0.2842, 'learning_rate': 3.3251e-05, 'epoch': 1.18, 'throughput': 600.97}
[INFO|2026-04-22 01:59:14] logging.py:144 >> {'loss': 0.2723, 'learning_rate': 3.3169e-05, 'epoch': 1.19, 'throughput': 601.15}
[INFO|2026-04-22 02:03:44] logging.py:144 >> {'loss': 0.2604, 'learning_rate': 3.3087e-05, 'epoch': 1.19, 'throughput': 601.50}
[INFO|2026-04-22 02:08:44] logging.py:144 >> {'loss': 0.2824, 'learning_rate': 3.3005e-05, 'epoch': 1.19, 'throughput': 601.72}
[INFO|2026-04-22 02:13:56] logging.py:144 >> {'loss': 0.2702, 'learning_rate': 3.2923e-05, 'epoch': 1.20, 'throughput': 601.87}
[INFO|2026-04-22 02:19:02] logging.py:144 >> {'loss': 0.2753, 'learning_rate': 3.2841e-05, 'epoch': 1.20, 'throughput': 602.07}
[INFO|2026-04-22 02:23:59] logging.py:144 >> {'loss': 0.2790, 'learning_rate': 3.2758e-05, 'epoch': 1.20, 'throughput': 602.32}
[INFO|2026-04-22 02:28:50] logging.py:144 >> {'loss': 0.2722, 'learning_rate': 3.2676e-05, 'epoch': 1.21, 'throughput': 602.59}
[INFO|2026-04-22 02:33:50] logging.py:144 >> {'loss': 0.2735, 'learning_rate': 3.2593e-05, 'epoch': 1.21, 'throughput': 602.82}
[INFO|2026-04-22 02:38:45] logging.py:144 >> {'loss': 0.2712, 'learning_rate': 3.2511e-05, 'epoch': 1.21, 'throughput': 603.09}
[INFO|2026-04-22 02:43:59] logging.py:144 >> {'loss': 0.2721, 'learning_rate': 3.2428e-05, 'epoch': 1.22, 'throughput': 603.27}
[INFO|2026-04-22 02:49:02] logging.py:144 >> {'loss': 0.2744, 'learning_rate': 3.2345e-05, 'epoch': 1.22, 'throughput': 603.49}
[INFO|2026-04-22 02:53:37] logging.py:144 >> {'loss': 0.2778, 'learning_rate': 3.2262e-05, 'epoch': 1.22, 'throughput': 603.78}
[INFO|2026-04-22 02:58:59] logging.py:144 >> {'loss': 0.2758, 'learning_rate': 3.2179e-05, 'epoch': 1.23, 'throughput': 603.87}
[INFO|2026-04-22 03:04:23] logging.py:144 >> {'loss': 0.2702, 'learning_rate': 3.2096e-05, 'epoch': 1.23, 'throughput': 603.97}
[INFO|2026-04-22 03:10:02] logging.py:144 >> {'loss': 0.2789, 'learning_rate': 3.2013e-05, 'epoch': 1.23, 'throughput': 604.05}
[INFO|2026-04-22 03:15:33] logging.py:144 >> {'loss': 0.2840, 'learning_rate': 3.1930e-05, 'epoch': 1.24, 'throughput': 604.12}
[INFO|2026-04-22 03:20:28] logging.py:144 >> {'loss': 0.2759, 'learning_rate': 3.1846e-05, 'epoch': 1.24, 'throughput': 604.37}
[INFO|2026-04-22 03:25:09] logging.py:144 >> {'loss': 0.2711, 'learning_rate': 3.1763e-05, 'epoch': 1.24, 'throughput': 604.66}
[INFO|2026-04-22 03:29:55] logging.py:144 >> {'loss': 0.2793, 'learning_rate': 3.1680e-05, 'epoch': 1.25, 'throughput': 604.93}
[INFO|2026-04-22 03:34:43] logging.py:144 >> {'loss': 0.2739, 'learning_rate': 3.1596e-05, 'epoch': 1.25, 'throughput': 605.20}
[INFO|2026-04-22 03:39:22] logging.py:144 >> {'loss': 0.2711, 'learning_rate': 3.1512e-05, 'epoch': 1.25, 'throughput': 605.54}
[INFO|2026-04-22 03:44:02] logging.py:144 >> {'loss': 0.2775, 'learning_rate': 3.1429e-05, 'epoch': 1.26, 'throughput': 605.83}
[INFO|2026-04-22 03:49:16] logging.py:144 >> {'loss': 0.2741, 'learning_rate': 3.1345e-05, 'epoch': 1.26, 'throughput': 605.97}
[INFO|2026-04-22 03:54:22] logging.py:144 >> {'loss': 0.2756, 'learning_rate': 3.1261e-05, 'epoch': 1.26, 'throughput': 606.14}
[INFO|2026-04-22 03:59:35] logging.py:144 >> {'loss': 0.2827, 'learning_rate': 3.1177e-05, 'epoch': 1.27, 'throughput': 606.30}
[INFO|2026-04-22 04:04:17] logging.py:144 >> {'loss': 0.2643, 'learning_rate': 3.1093e-05, 'epoch': 1.27, 'throughput': 606.57}
[INFO|2026-04-22 04:09:06] logging.py:144 >> {'loss': 0.2780, 'learning_rate': 3.1009e-05, 'epoch': 1.27, 'throughput': 606.79}
[INFO|2026-04-22 04:13:46] logging.py:144 >> {'loss': 0.2756, 'learning_rate': 3.0925e-05, 'epoch': 1.28, 'throughput': 607.07}
[INFO|2026-04-22 04:18:49] logging.py:144 >> {'loss': 0.2803, 'learning_rate': 3.0840e-05, 'epoch': 1.28, 'throughput': 607.31}
[INFO|2026-04-22 04:23:46] logging.py:144 >> {'loss': 0.2795, 'learning_rate': 3.0756e-05, 'epoch': 1.28, 'throughput': 607.56}
[INFO|2026-04-22 04:28:40] logging.py:144 >> {'loss': 0.2780, 'learning_rate': 3.0672e-05, 'epoch': 1.29, 'throughput': 607.81}
[INFO|2026-04-22 04:33:54] logging.py:144 >> {'loss': 0.2795, 'learning_rate': 3.0587e-05, 'epoch': 1.29, 'throughput': 608.01}
[INFO|2026-04-22 04:38:42] logging.py:144 >> {'loss': 0.2775, 'learning_rate': 3.0503e-05, 'epoch': 1.29, 'throughput': 608.23}
[INFO|2026-04-22 04:43:27] logging.py:144 >> {'loss': 0.2766, 'learning_rate': 3.0418e-05, 'epoch': 1.30, 'throughput': 608.53}
[INFO|2026-04-22 04:48:11] logging.py:144 >> {'loss': 0.2749, 'learning_rate': 3.0333e-05, 'epoch': 1.30, 'throughput': 608.78}
[INFO|2026-04-22 04:53:00] logging.py:144 >> {'loss': 0.2806, 'learning_rate': 3.0249e-05, 'epoch': 1.30, 'throughput': 608.99}
[INFO|2026-04-22 04:57:34] logging.py:144 >> {'loss': 0.2767, 'learning_rate': 3.0164e-05, 'epoch': 1.31, 'throughput': 609.30}
[INFO|2026-04-22 05:02:38] logging.py:144 >> {'loss': 0.2660, 'learning_rate': 3.0079e-05, 'epoch': 1.31, 'throughput': 609.48}
[INFO|2026-04-22 05:07:37] logging.py:144 >> {'loss': 0.2760, 'learning_rate': 2.9994e-05, 'epoch': 1.31, 'throughput': 609.72}
[INFO|2026-04-22 05:12:24] logging.py:144 >> {'loss': 0.2764, 'learning_rate': 2.9909e-05, 'epoch': 1.32, 'throughput': 609.98}
[INFO|2026-04-22 05:17:33] logging.py:144 >> {'loss': 0.2720, 'learning_rate': 2.9824e-05, 'epoch': 1.32, 'throughput': 610.16}
[INFO|2026-04-22 05:22:21] logging.py:144 >> {'loss': 0.2695, 'learning_rate': 2.9739e-05, 'epoch': 1.32, 'throughput': 610.43}
[INFO|2026-04-22 05:27:12] logging.py:144 >> {'loss': 0.2749, 'learning_rate': 2.9654e-05, 'epoch': 1.33, 'throughput': 610.62}
[INFO|2026-04-22 05:31:59] logging.py:144 >> {'loss': 0.2678, 'learning_rate': 2.9569e-05, 'epoch': 1.33, 'throughput': 610.89}
[INFO|2026-04-22 05:37:06] logging.py:144 >> {'loss': 0.2691, 'learning_rate': 2.9483e-05, 'epoch': 1.33, 'throughput': 611.08}
[INFO|2026-04-22 05:42:05] logging.py:144 >> {'loss': 0.2760, 'learning_rate': 2.9398e-05, 'epoch': 1.34, 'throughput': 611.28}
[INFO|2026-04-22 05:47:04] logging.py:144 >> {'loss': 0.2759, 'learning_rate': 2.9313e-05, 'epoch': 1.34, 'throughput': 611.49}
[INFO|2026-04-22 05:52:08] logging.py:144 >> {'loss': 0.2764, 'learning_rate': 2.9227e-05, 'epoch': 1.34, 'throughput': 611.67}
[INFO|2026-04-22 05:57:14] logging.py:144 >> {'loss': 0.2821, 'learning_rate': 2.9142e-05, 'epoch': 1.35, 'throughput': 611.83}
[INFO|2026-04-22 06:02:10] logging.py:144 >> {'loss': 0.2698, 'learning_rate': 2.9056e-05, 'epoch': 1.35, 'throughput': 612.07}
[INFO|2026-04-22 06:07:21] logging.py:144 >> {'loss': 0.2750, 'learning_rate': 2.8971e-05, 'epoch': 1.35, 'throughput': 612.22}
[INFO|2026-04-22 06:12:06] logging.py:144 >> {'loss': 0.2783, 'learning_rate': 2.8885e-05, 'epoch': 1.36, 'throughput': 612.47}
[INFO|2026-04-22 06:16:31] logging.py:144 >> {'loss': 0.2802, 'learning_rate': 2.8800e-05, 'epoch': 1.36, 'throughput': 612.78}
[INFO|2026-04-22 06:20:54] logging.py:144 >> {'loss': 0.2679, 'learning_rate': 2.8714e-05, 'epoch': 1.36, 'throughput': 613.10}
[INFO|2026-04-22 06:25:24] logging.py:144 >> {'loss': 0.2706, 'learning_rate': 2.8628e-05, 'epoch': 1.37, 'throughput': 613.44}
[INFO|2026-04-22 06:30:16] logging.py:144 >> {'loss': 0.2836, 'learning_rate': 2.8542e-05, 'epoch': 1.37, 'throughput': 613.64}
[INFO|2026-04-22 06:35:14] logging.py:144 >> {'loss': 0.2807, 'learning_rate': 2.8456e-05, 'epoch': 1.37, 'throughput': 613.81}
[INFO|2026-04-22 06:39:55] logging.py:144 >> {'loss': 0.2763, 'learning_rate': 2.8371e-05, 'epoch': 1.38, 'throughput': 614.08}
[INFO|2026-04-22 06:45:21] logging.py:144 >> {'loss': 0.2625, 'learning_rate': 2.8285e-05, 'epoch': 1.38, 'throughput': 614.11}
[INFO|2026-04-22 06:50:41] logging.py:144 >> {'loss': 0.2723, 'learning_rate': 2.8199e-05, 'epoch': 1.38, 'throughput': 614.20}
[INFO|2026-04-22 06:55:45] logging.py:144 >> {'loss': 0.2814, 'learning_rate': 2.8113e-05, 'epoch': 1.39, 'throughput': 614.37}
[INFO|2026-04-22 07:00:39] logging.py:144 >> {'loss': 0.2729, 'learning_rate': 2.8027e-05, 'epoch': 1.39, 'throughput': 614.58}
[INFO|2026-04-22 07:05:32] logging.py:144 >> {'loss': 0.2759, 'learning_rate': 2.7941e-05, 'epoch': 1.39, 'throughput': 614.84}
[INFO|2026-04-22 07:11:10] logging.py:144 >> {'loss': 0.2685, 'learning_rate': 2.7854e-05, 'epoch': 1.40, 'throughput': 614.85}
[INFO|2026-04-22 07:15:43] logging.py:144 >> {'loss': 0.2699, 'learning_rate': 2.7768e-05, 'epoch': 1.40, 'throughput': 615.11}
[INFO|2026-04-22 07:21:04] logging.py:144 >> {'loss': 0.2714, 'learning_rate': 2.7682e-05, 'epoch': 1.40, 'throughput': 615.20}
[INFO|2026-04-22 07:26:08] logging.py:144 >> {'loss': 0.2769, 'learning_rate': 2.7596e-05, 'epoch': 1.41, 'throughput': 615.38}
[INFO|2026-04-22 07:31:20] logging.py:144 >> {'loss': 0.2730, 'learning_rate': 2.7510e-05, 'epoch': 1.41, 'throughput': 615.54}
[INFO|2026-04-22 07:36:32] logging.py:144 >> {'loss': 0.2728, 'learning_rate': 2.7423e-05, 'epoch': 1.41, 'throughput': 615.68}
[INFO|2026-04-22 07:42:01] logging.py:144 >> {'loss': 0.2820, 'learning_rate': 2.7337e-05, 'epoch': 1.42, 'throughput': 615.73}
[INFO|2026-04-22 07:47:34] logging.py:144 >> {'loss': 0.2795, 'learning_rate': 2.7251e-05, 'epoch': 1.42, 'throughput': 615.74}
[INFO|2026-04-22 07:52:54] logging.py:144 >> {'loss': 0.2838, 'learning_rate': 2.7165e-05, 'epoch': 1.42, 'throughput': 615.86}
[INFO|2026-04-22 07:58:15] logging.py:144 >> {'loss': 0.2731, 'learning_rate': 2.7078e-05, 'epoch': 1.43, 'throughput': 615.92}
[INFO|2026-04-22 08:03:10] logging.py:144 >> {'loss': 0.2735, 'learning_rate': 2.6992e-05, 'epoch': 1.43, 'throughput': 616.13}
[INFO|2026-04-22 08:08:10] logging.py:144 >> {'loss': 0.2695, 'learning_rate': 2.6905e-05, 'epoch': 1.43, 'throughput': 616.30}
[INFO|2026-04-22 08:12:57] logging.py:144 >> {'loss': 0.2791, 'learning_rate': 2.6819e-05, 'epoch': 1.44, 'throughput': 616.52}
[INFO|2026-04-22 08:17:35] logging.py:144 >> {'loss': 0.2724, 'learning_rate': 2.6732e-05, 'epoch': 1.44, 'throughput': 616.79}
[INFO|2026-04-22 08:22:21] logging.py:144 >> {'loss': 0.2679, 'learning_rate': 2.6646e-05, 'epoch': 1.44, 'throughput': 616.97}
[INFO|2026-04-22 08:27:18] logging.py:144 >> {'loss': 0.2862, 'learning_rate': 2.6559e-05, 'epoch': 1.44, 'throughput': 617.15}
[INFO|2026-04-22 08:31:50] logging.py:144 >> {'loss': 0.2709, 'learning_rate': 2.6473e-05, 'epoch': 1.45, 'throughput': 617.41}
[INFO|2026-04-22 08:36:27] logging.py:144 >> {'loss': 0.2822, 'learning_rate': 2.6386e-05, 'epoch': 1.45, 'throughput': 617.67}
[INFO|2026-04-22 08:41:09] logging.py:144 >> {'loss': 0.2690, 'learning_rate': 2.6300e-05, 'epoch': 1.45, 'throughput': 617.90}
[INFO|2026-04-22 08:45:29] logging.py:144 >> {'loss': 0.2643, 'learning_rate': 2.6213e-05, 'epoch': 1.46, 'throughput': 618.23}
[INFO|2026-04-22 08:50:23] logging.py:144 >> {'loss': 0.2755, 'learning_rate': 2.6127e-05, 'epoch': 1.46, 'throughput': 618.40}
[INFO|2026-04-22 08:55:42] logging.py:144 >> {'loss': 0.2783, 'learning_rate': 2.6040e-05, 'epoch': 1.46, 'throughput': 618.49}
[INFO|2026-04-22 09:00:35] logging.py:144 >> {'loss': 0.2772, 'learning_rate': 2.5953e-05, 'epoch': 1.47, 'throughput': 618.66}
[INFO|2026-04-22 09:05:29] logging.py:144 >> {'loss': 0.2756, 'learning_rate': 2.5867e-05, 'epoch': 1.47, 'throughput': 618.83}
[INFO|2026-04-22 09:10:23] logging.py:144 >> {'loss': 0.2683, 'learning_rate': 2.5780e-05, 'epoch': 1.47, 'throughput': 619.05}
[INFO|2026-04-22 09:15:27] logging.py:144 >> {'loss': 0.2771, 'learning_rate': 2.5693e-05, 'epoch': 1.48, 'throughput': 619.21}
[INFO|2026-04-22 09:20:54] logging.py:144 >> {'loss': 0.2854, 'learning_rate': 2.5607e-05, 'epoch': 1.48, 'throughput': 619.27}
[INFO|2026-04-22 09:25:34] logging.py:144 >> {'loss': 0.2746, 'learning_rate': 2.5520e-05, 'epoch': 1.48, 'throughput': 619.50}
[INFO|2026-04-22 09:31:05] logging.py:144 >> {'loss': 0.2858, 'learning_rate': 2.5433e-05, 'epoch': 1.49, 'throughput': 619.59}
[INFO|2026-04-22 09:35:43] logging.py:144 >> {'loss': 0.2771, 'learning_rate': 2.5347e-05, 'epoch': 1.49, 'throughput': 619.83}
[INFO|2026-04-22 09:40:32] logging.py:144 >> {'loss': 0.2803, 'learning_rate': 2.5260e-05, 'epoch': 1.49, 'throughput': 620.04}
[INFO|2026-04-22 09:45:40] logging.py:144 >> {'loss': 0.2789, 'learning_rate': 2.5173e-05, 'epoch': 1.50, 'throughput': 620.20}
[INFO|2026-04-22 09:50:27] logging.py:144 >> {'loss': 0.2838, 'learning_rate': 2.5087e-05, 'epoch': 1.50, 'throughput': 620.45}
[INFO|2026-04-22 09:55:06] logging.py:144 >> {'loss': 0.2761, 'learning_rate': 2.5000e-05, 'epoch': 1.50, 'throughput': 620.70}
[INFO|2026-04-22 09:59:50] logging.py:144 >> {'loss': 0.2772, 'learning_rate': 2.4913e-05, 'epoch': 1.51, 'throughput': 620.93}
[INFO|2026-04-22 10:04:59] logging.py:144 >> {'loss': 0.2844, 'learning_rate': 2.4827e-05, 'epoch': 1.51, 'throughput': 621.03}
[INFO|2026-04-22 10:09:57] logging.py:144 >> {'loss': 0.2826, 'learning_rate': 2.4740e-05, 'epoch': 1.51, 'throughput': 621.20}
[INFO|2026-04-22 10:14:37] logging.py:144 >> {'loss': 0.2835, 'learning_rate': 2.4653e-05, 'epoch': 1.52, 'throughput': 621.43}
[INFO|2026-04-22 10:19:30] logging.py:144 >> {'loss': 0.2669, 'learning_rate': 2.4567e-05, 'epoch': 1.52, 'throughput': 621.59}
[INFO|2026-04-22 10:24:47] logging.py:144 >> {'loss': 0.2790, 'learning_rate': 2.4480e-05, 'epoch': 1.52, 'throughput': 621.67}
[INFO|2026-04-22 10:30:23] logging.py:144 >> {'loss': 0.2798, 'learning_rate': 2.4393e-05, 'epoch': 1.53, 'throughput': 621.66}
[INFO|2026-04-22 10:34:57] logging.py:144 >> {'loss': 0.2746, 'learning_rate': 2.4307e-05, 'epoch': 1.53, 'throughput': 621.91}
[INFO|2026-04-22 10:40:22] logging.py:144 >> {'loss': 0.2753, 'learning_rate': 2.4220e-05, 'epoch': 1.53, 'throughput': 621.98}
[INFO|2026-04-22 10:45:12] logging.py:144 >> {'loss': 0.2626, 'learning_rate': 2.4133e-05, 'epoch': 1.54, 'throughput': 622.15}
[INFO|2026-04-22 10:50:07] logging.py:144 >> {'loss': 0.2712, 'learning_rate': 2.4047e-05, 'epoch': 1.54, 'throughput': 622.35}
[INFO|2026-04-22 10:54:46] logging.py:144 >> {'loss': 0.2736, 'learning_rate': 2.3960e-05, 'epoch': 1.54, 'throughput': 622.53}
[INFO|2026-04-22 10:59:41] logging.py:144 >> {'loss': 0.2783, 'learning_rate': 2.3873e-05, 'epoch': 1.55, 'throughput': 622.63}
[INFO|2026-04-22 11:04:37] logging.py:144 >> {'loss': 0.2832, 'learning_rate': 2.3787e-05, 'epoch': 1.55, 'throughput': 622.77}
[INFO|2026-04-22 11:09:16] logging.py:144 >> {'loss': 0.2752, 'learning_rate': 2.3700e-05, 'epoch': 1.55, 'throughput': 622.99}
[INFO|2026-04-22 11:14:03] logging.py:144 >> {'loss': 0.2746, 'learning_rate': 2.3614e-05, 'epoch': 1.56, 'throughput': 623.21}
[INFO|2026-04-22 11:19:18] logging.py:144 >> {'loss': 0.2713, 'learning_rate': 2.3527e-05, 'epoch': 1.56, 'throughput': 623.32}
[INFO|2026-04-22 11:24:18] logging.py:144 >> {'loss': 0.2781, 'learning_rate': 2.3441e-05, 'epoch': 1.56, 'throughput': 623.43}
[INFO|2026-04-22 11:29:07] logging.py:144 >> {'loss': 0.2798, 'learning_rate': 2.3354e-05, 'epoch': 1.57, 'throughput': 623.63}
[INFO|2026-04-22 11:34:07] logging.py:144 >> {'loss': 0.2711, 'learning_rate': 2.3268e-05, 'epoch': 1.57, 'throughput': 623.80}
[INFO|2026-04-22 11:39:05] logging.py:144 >> {'loss': 0.2764, 'learning_rate': 2.3181e-05, 'epoch': 1.57, 'throughput': 623.95}
[INFO|2026-04-22 11:44:05] logging.py:144 >> {'loss': 0.2717, 'learning_rate': 2.3095e-05, 'epoch': 1.58, 'throughput': 624.11}
[INFO|2026-04-22 11:49:10] logging.py:144 >> {'loss': 0.2778, 'learning_rate': 2.3008e-05, 'epoch': 1.58, 'throughput': 624.26}
[INFO|2026-04-22 11:53:39] logging.py:144 >> {'loss': 0.2831, 'learning_rate': 2.2922e-05, 'epoch': 1.58, 'throughput': 624.50}
[INFO|2026-04-22 11:58:52] logging.py:144 >> {'loss': 0.2885, 'learning_rate': 2.2835e-05, 'epoch': 1.59, 'throughput': 624.64}
[INFO|2026-04-22 12:03:54] logging.py:144 >> {'loss': 0.2737, 'learning_rate': 2.2749e-05, 'epoch': 1.59, 'throughput': 624.79}
[INFO|2026-04-22 12:09:06] logging.py:144 >> {'loss': 0.2853, 'learning_rate': 2.2663e-05, 'epoch': 1.59, 'throughput': 624.87}
[INFO|2026-04-22 12:13:44] logging.py:144 >> {'loss': 0.2681, 'learning_rate': 2.2577e-05, 'epoch': 1.60, 'throughput': 625.07}
[INFO|2026-04-22 12:18:51] logging.py:144 >> {'loss': 0.2740, 'learning_rate': 2.2490e-05, 'epoch': 1.60, 'throughput': 625.19}
[INFO|2026-04-22 12:24:14] logging.py:144 >> {'loss': 0.2846, 'learning_rate': 2.2404e-05, 'epoch': 1.60, 'throughput': 625.24}
[INFO|2026-04-22 12:29:24] logging.py:144 >> {'loss': 0.2726, 'learning_rate': 2.2318e-05, 'epoch': 1.61, 'throughput': 625.36}
[INFO|2026-04-22 12:34:34] logging.py:144 >> {'loss': 0.2739, 'learning_rate': 2.2232e-05, 'epoch': 1.61, 'throughput': 625.47}
[INFO|2026-04-22 12:40:56] logging.py:144 >> {'loss': 0.2869, 'learning_rate': 2.2146e-05, 'epoch': 1.61, 'throughput': 625.34}
[INFO|2026-04-22 12:46:25] logging.py:144 >> {'loss': 0.2681, 'learning_rate': 2.2059e-05, 'epoch': 1.62, 'throughput': 625.35}
[INFO|2026-04-22 12:51:43] logging.py:144 >> {'loss': 0.2726, 'learning_rate': 2.1973e-05, 'epoch': 1.62, 'throughput': 625.39}
[INFO|2026-04-22 12:57:22] logging.py:144 >> {'loss': 0.2802, 'learning_rate': 2.1887e-05, 'epoch': 1.62, 'throughput': 625.37}
[INFO|2026-04-22 13:02:52] logging.py:144 >> {'loss': 0.2742, 'learning_rate': 2.1801e-05, 'epoch': 1.63, 'throughput': 625.41}
[INFO|2026-04-22 13:08:36] logging.py:144 >> {'loss': 0.2781, 'learning_rate': 2.1715e-05, 'epoch': 1.63, 'throughput': 625.40}
[INFO|2026-04-22 13:13:35] logging.py:144 >> {'loss': 0.2749, 'learning_rate': 2.1629e-05, 'epoch': 1.63, 'throughput': 625.52}
[INFO|2026-04-22 13:18:28] logging.py:144 >> {'loss': 0.2709, 'learning_rate': 2.1544e-05, 'epoch': 1.64, 'throughput': 625.71}
[INFO|2026-04-22 13:23:45] logging.py:144 >> {'loss': 0.2760, 'learning_rate': 2.1458e-05, 'epoch': 1.64, 'throughput': 625.77}
[INFO|2026-04-22 13:29:30] logging.py:144 >> {'loss': 0.2771, 'learning_rate': 2.1372e-05, 'epoch': 1.64, 'throughput': 625.74}
[INFO|2026-04-22 13:34:51] logging.py:144 >> {'loss': 0.2765, 'learning_rate': 2.1286e-05, 'epoch': 1.65, 'throughput': 625.79}
[INFO|2026-04-22 13:40:24] logging.py:144 >> {'loss': 0.2843, 'learning_rate': 2.1200e-05, 'epoch': 1.65, 'throughput': 625.83}
[INFO|2026-04-22 13:45:31] logging.py:144 >> {'loss': 0.2711, 'learning_rate': 2.1115e-05, 'epoch': 1.65, 'throughput': 625.94}
[INFO|2026-04-22 13:50:34] logging.py:144 >> {'loss': 0.2692, 'learning_rate': 2.1029e-05, 'epoch': 1.66, 'throughput': 626.06}
[INFO|2026-04-22 13:55:41] logging.py:144 >> {'loss': 0.2654, 'learning_rate': 2.0944e-05, 'epoch': 1.66, 'throughput': 626.16}
[INFO|2026-04-22 14:00:45] logging.py:144 >> {'loss': 0.2682, 'learning_rate': 2.0858e-05, 'epoch': 1.66, 'throughput': 626.26}
[INFO|2026-04-22 14:05:44] logging.py:144 >> {'loss': 0.2751, 'learning_rate': 2.0773e-05, 'epoch': 1.67, 'throughput': 626.36}
[INFO|2026-04-22 14:11:11] logging.py:144 >> {'loss': 0.2742, 'learning_rate': 2.0687e-05, 'epoch': 1.67, 'throughput': 626.40}
[INFO|2026-04-22 14:16:42] logging.py:144 >> {'loss': 0.2642, 'learning_rate': 2.0602e-05, 'epoch': 1.67, 'throughput': 626.41}
[INFO|2026-04-22 14:21:40] logging.py:144 >> {'loss': 0.2754, 'learning_rate': 2.0517e-05, 'epoch': 1.68, 'throughput': 626.55}
[INFO|2026-04-22 14:27:17] logging.py:144 >> {'loss': 0.2714, 'learning_rate': 2.0431e-05, 'epoch': 1.68, 'throughput': 626.56}
[INFO|2026-04-22 14:32:20] logging.py:144 >> {'loss': 0.2775, 'learning_rate': 2.0346e-05, 'epoch': 1.68, 'throughput': 626.68}
[INFO|2026-04-22 14:37:43] logging.py:144 >> {'loss': 0.2712, 'learning_rate': 2.0261e-05, 'epoch': 1.69, 'throughput': 626.70}
[INFO|2026-04-22 14:43:06] logging.py:144 >> {'loss': 0.2700, 'learning_rate': 2.0176e-05, 'epoch': 1.69, 'throughput': 626.74}
[INFO|2026-04-22 14:48:18] logging.py:144 >> {'loss': 0.2681, 'learning_rate': 2.0091e-05, 'epoch': 1.69, 'throughput': 626.77}
[INFO|2026-04-22 14:53:57] logging.py:144 >> {'loss': 0.2688, 'learning_rate': 2.0006e-05, 'epoch': 1.70, 'throughput': 626.76}
[INFO|2026-04-22 14:59:50] logging.py:144 >> {'loss': 0.2731, 'learning_rate': 1.9921e-05, 'epoch': 1.70, 'throughput': 626.73}
[INFO|2026-04-22 15:05:23] logging.py:144 >> {'loss': 0.2757, 'learning_rate': 1.9836e-05, 'epoch': 1.70, 'throughput': 626.75}
[INFO|2026-04-22 15:10:25] logging.py:144 >> {'loss': 0.2749, 'learning_rate': 1.9751e-05, 'epoch': 1.71, 'throughput': 626.87}
[INFO|2026-04-22 15:15:37] logging.py:144 >> {'loss': 0.2615, 'learning_rate': 1.9667e-05, 'epoch': 1.71, 'throughput': 626.90}
[INFO|2026-04-22 15:20:29] logging.py:144 >> {'loss': 0.2653, 'learning_rate': 1.9582e-05, 'epoch': 1.71, 'throughput': 627.11}
[INFO|2026-04-22 15:25:27] logging.py:144 >> {'loss': 0.2724, 'learning_rate': 1.9497e-05, 'epoch': 1.72, 'throughput': 627.27}
[INFO|2026-04-22 15:30:52] logging.py:144 >> {'loss': 0.2700, 'learning_rate': 1.9413e-05, 'epoch': 1.72, 'throughput': 627.32}
[INFO|2026-04-22 15:36:49] logging.py:144 >> {'loss': 0.2725, 'learning_rate': 1.9328e-05, 'epoch': 1.72, 'throughput': 627.30}
[INFO|2026-04-22 15:42:07] logging.py:144 >> {'loss': 0.2808, 'learning_rate': 1.9244e-05, 'epoch': 1.73, 'throughput': 627.36}
[INFO|2026-04-22 15:47:30] logging.py:144 >> {'loss': 0.2855, 'learning_rate': 1.9160e-05, 'epoch': 1.73, 'throughput': 627.42}
[INFO|2026-04-22 15:52:49] logging.py:144 >> {'loss': 0.2715, 'learning_rate': 1.9075e-05, 'epoch': 1.73, 'throughput': 627.47}
[INFO|2026-04-22 15:57:56] logging.py:144 >> {'loss': 0.2759, 'learning_rate': 1.8991e-05, 'epoch': 1.74, 'throughput': 627.55}
[INFO|2026-04-22 16:02:54] logging.py:144 >> {'loss': 0.2638, 'learning_rate': 1.8907e-05, 'epoch': 1.74, 'throughput': 627.64}
[INFO|2026-04-22 16:08:04] logging.py:144 >> {'loss': 0.2714, 'learning_rate': 1.8823e-05, 'epoch': 1.74, 'throughput': 627.73}
[INFO|2026-04-22 16:13:07] logging.py:144 >> {'loss': 0.2713, 'learning_rate': 1.8739e-05, 'epoch': 1.75, 'throughput': 627.83}
[INFO|2026-04-22 16:18:23] logging.py:144 >> {'loss': 0.2655, 'learning_rate': 1.8655e-05, 'epoch': 1.75, 'throughput': 627.88}
[INFO|2026-04-22 16:23:06] logging.py:144 >> {'loss': 0.2776, 'learning_rate': 1.8571e-05, 'epoch': 1.75, 'throughput': 628.02}
[INFO|2026-04-22 16:28:22] logging.py:144 >> {'loss': 0.2751, 'learning_rate': 1.8488e-05, 'epoch': 1.76, 'throughput': 628.10}
[INFO|2026-04-22 16:33:32] logging.py:144 >> {'loss': 0.2610, 'learning_rate': 1.8404e-05, 'epoch': 1.76, 'throughput': 628.19}
[INFO|2026-04-22 16:38:12] logging.py:144 >> {'loss': 0.2681, 'learning_rate': 1.8320e-05, 'epoch': 1.76, 'throughput': 628.41}
[INFO|2026-04-22 16:42:51] logging.py:144 >> {'loss': 0.2629, 'learning_rate': 1.8237e-05, 'epoch': 1.77, 'throughput': 628.58}
[INFO|2026-04-22 16:48:07] logging.py:144 >> {'loss': 0.2791, 'learning_rate': 1.8154e-05, 'epoch': 1.77, 'throughput': 628.63}
[INFO|2026-04-22 16:53:17] logging.py:144 >> {'loss': 0.2694, 'learning_rate': 1.8070e-05, 'epoch': 1.77, 'throughput': 628.70}
[INFO|2026-04-22 16:58:34] logging.py:144 >> {'loss': 0.2819, 'learning_rate': 1.7987e-05, 'epoch': 1.78, 'throughput': 628.79}
[INFO|2026-04-22 17:03:51] logging.py:144 >> {'loss': 0.2715, 'learning_rate': 1.7904e-05, 'epoch': 1.78, 'throughput': 628.88}
[INFO|2026-04-22 17:08:58] logging.py:144 >> {'loss': 0.2701, 'learning_rate': 1.7821e-05, 'epoch': 1.78, 'throughput': 629.00}
[INFO|2026-04-22 17:13:59] logging.py:144 >> {'loss': 0.2697, 'learning_rate': 1.7738e-05, 'epoch': 1.79, 'throughput': 629.09}
[INFO|2026-04-22 17:19:26] logging.py:144 >> {'loss': 0.2661, 'learning_rate': 1.7655e-05, 'epoch': 1.79, 'throughput': 629.09}
[INFO|2026-04-22 17:24:24] logging.py:144 >> {'loss': 0.2615, 'learning_rate': 1.7572e-05, 'epoch': 1.79, 'throughput': 629.13}
[INFO|2026-04-22 17:29:46] logging.py:144 >> {'loss': 0.2684, 'learning_rate': 1.7489e-05, 'epoch': 1.80, 'throughput': 629.16}
[INFO|2026-04-22 17:35:02] logging.py:144 >> {'loss': 0.2744, 'learning_rate': 1.7407e-05, 'epoch': 1.80, 'throughput': 629.21}
[INFO|2026-04-22 17:40:56] logging.py:144 >> {'loss': 0.2674, 'learning_rate': 1.7324e-05, 'epoch': 1.80, 'throughput': 629.19}
[INFO|2026-04-22 17:46:16] logging.py:144 >> {'loss': 0.2689, 'learning_rate': 1.7242e-05, 'epoch': 1.81, 'throughput': 629.23}
[INFO|2026-04-22 17:51:36] logging.py:144 >> {'loss': 0.2678, 'learning_rate': 1.7159e-05, 'epoch': 1.81, 'throughput': 629.28}
[INFO|2026-04-22 17:57:23] logging.py:144 >> {'loss': 0.2695, 'learning_rate': 1.7077e-05, 'epoch': 1.81, 'throughput': 629.25}
[INFO|2026-04-22 18:02:52] logging.py:144 >> {'loss': 0.2697, 'learning_rate': 1.6995e-05, 'epoch': 1.82, 'throughput': 629.28}
[INFO|2026-04-22 18:07:58] logging.py:144 >> {'loss': 0.2665, 'learning_rate': 1.6913e-05, 'epoch': 1.82, 'throughput': 629.39}
[INFO|2026-04-22 18:12:56] logging.py:144 >> {'loss': 0.2646, 'learning_rate': 1.6831e-05, 'epoch': 1.82, 'throughput': 629.49}
[INFO|2026-04-22 18:18:10] logging.py:144 >> {'loss': 0.2656, 'learning_rate': 1.6749e-05, 'epoch': 1.83, 'throughput': 629.57}
[INFO|2026-04-22 18:23:12] logging.py:144 >> {'loss': 0.2727, 'learning_rate': 1.6667e-05, 'epoch': 1.83, 'throughput': 629.67}
[INFO|2026-04-22 18:28:16] logging.py:144 >> {'loss': 0.2620, 'learning_rate': 1.6585e-05, 'epoch': 1.83, 'throughput': 629.75}
[INFO|2026-04-22 18:33:23] logging.py:144 >> {'loss': 0.2669, 'learning_rate': 1.6504e-05, 'epoch': 1.84, 'throughput': 629.83}
[INFO|2026-04-22 18:38:33] logging.py:144 >> {'loss': 0.2731, 'learning_rate': 1.6422e-05, 'epoch': 1.84, 'throughput': 629.92}
[INFO|2026-04-22 18:43:43] logging.py:144 >> {'loss': 0.2827, 'learning_rate': 1.6341e-05, 'epoch': 1.84, 'throughput': 630.02}
[INFO|2026-04-22 18:48:43] logging.py:144 >> {'loss': 0.2710, 'learning_rate': 1.6260e-05, 'epoch': 1.85, 'throughput': 630.14}
[INFO|2026-04-22 18:53:59] logging.py:144 >> {'loss': 0.2703, 'learning_rate': 1.6179e-05, 'epoch': 1.85, 'throughput': 630.16}
[INFO|2026-04-22 18:59:06] logging.py:144 >> {'loss': 0.2696, 'learning_rate': 1.6097e-05, 'epoch': 1.85, 'throughput': 630.30}
[INFO|2026-04-22 19:04:06] logging.py:144 >> {'loss': 0.2752, 'learning_rate': 1.6017e-05, 'epoch': 1.86, 'throughput': 630.42}
[INFO|2026-04-22 19:09:20] logging.py:144 >> {'loss': 0.2834, 'learning_rate': 1.5936e-05, 'epoch': 1.86, 'throughput': 630.51}
[INFO|2026-04-22 19:14:52] logging.py:144 >> {'loss': 0.2709, 'learning_rate': 1.5855e-05, 'epoch': 1.86, 'throughput': 630.53}
[INFO|2026-04-22 19:20:40] logging.py:144 >> {'loss': 0.2765, 'learning_rate': 1.5774e-05, 'epoch': 1.87, 'throughput': 630.46}
[INFO|2026-04-22 19:26:12] logging.py:144 >> {'loss': 0.2655, 'learning_rate': 1.5694e-05, 'epoch': 1.87, 'throughput': 630.46}
[INFO|2026-04-22 19:31:21] logging.py:144 >> {'loss': 0.2681, 'learning_rate': 1.5613e-05, 'epoch': 1.87, 'throughput': 630.54}
[INFO|2026-04-22 19:36:51] logging.py:144 >> {'loss': 0.2690, 'learning_rate': 1.5533e-05, 'epoch': 1.88, 'throughput': 630.56}
[INFO|2026-04-22 19:42:03] logging.py:144 >> {'loss': 0.2721, 'learning_rate': 1.5453e-05, 'epoch': 1.88, 'throughput': 630.65}
[INFO|2026-04-22 19:47:08] logging.py:144 >> {'loss': 0.2736, 'learning_rate': 1.5373e-05, 'epoch': 1.88, 'throughput': 630.74}
[INFO|2026-04-22 19:52:26] logging.py:144 >> {'loss': 0.2697, 'learning_rate': 1.5293e-05, 'epoch': 1.89, 'throughput': 630.75}
[INFO|2026-04-22 19:57:48] logging.py:144 >> {'loss': 0.2625, 'learning_rate': 1.5213e-05, 'epoch': 1.89, 'throughput': 630.76}
[INFO|2026-04-22 20:03:27] logging.py:144 >> {'loss': 0.2724, 'learning_rate': 1.5133e-05, 'epoch': 1.89, 'throughput': 630.76}
[INFO|2026-04-22 20:08:23] logging.py:144 >> {'loss': 0.2690, 'learning_rate': 1.5054e-05, 'epoch': 1.90, 'throughput': 630.84}
[INFO|2026-04-22 20:13:38] logging.py:144 >> {'loss': 0.2695, 'learning_rate': 1.4974e-05, 'epoch': 1.90, 'throughput': 630.92}
[INFO|2026-04-22 20:18:38] logging.py:144 >> {'loss': 0.2766, 'learning_rate': 1.4895e-05, 'epoch': 1.90, 'throughput': 631.04}
[INFO|2026-04-22 20:24:00] logging.py:144 >> {'loss': 0.2619, 'learning_rate': 1.4816e-05, 'epoch': 1.91, 'throughput': 631.05}
[INFO|2026-04-22 20:29:38] logging.py:144 >> {'loss': 0.2650, 'learning_rate': 1.4737e-05, 'epoch': 1.91, 'throughput': 631.02}
[INFO|2026-04-22 20:34:53] logging.py:144 >> {'loss': 0.2748, 'learning_rate': 1.4658e-05, 'epoch': 1.91, 'throughput': 631.08}
[INFO|2026-04-22 20:40:12] logging.py:144 >> {'loss': 0.2685, 'learning_rate': 1.4579e-05, 'epoch': 1.92, 'throughput': 631.10}
[INFO|2026-04-22 20:45:37] logging.py:144 >> {'loss': 0.2697, 'learning_rate': 1.4500e-05, 'epoch': 1.92, 'throughput': 631.11}
[INFO|2026-04-22 20:51:05] logging.py:144 >> {'loss': 0.2712, 'learning_rate': 1.4421e-05, 'epoch': 1.92, 'throughput': 631.11}
[INFO|2026-04-22 20:56:13] logging.py:144 >> {'loss': 0.2671, 'learning_rate': 1.4343e-05, 'epoch': 1.93, 'throughput': 631.18}
[INFO|2026-04-22 21:01:56] logging.py:144 >> {'loss': 0.2803, 'learning_rate': 1.4265e-05, 'epoch': 1.93, 'throughput': 631.17}
[INFO|2026-04-22 21:07:33] logging.py:144 >> {'loss': 0.2650, 'learning_rate': 1.4186e-05, 'epoch': 1.93, 'throughput': 631.15}
[INFO|2026-04-22 21:13:07] logging.py:144 >> {'loss': 0.2689, 'learning_rate': 1.4108e-05, 'epoch': 1.94, 'throughput': 631.14}
[INFO|2026-04-22 21:19:26] logging.py:144 >> {'loss': 0.2746, 'learning_rate': 1.4030e-05, 'epoch': 1.94, 'throughput': 631.00}
[INFO|2026-04-22 21:25:00] logging.py:144 >> {'loss': 0.2663, 'learning_rate': 1.3953e-05, 'epoch': 1.94, 'throughput': 631.00}
[INFO|2026-04-22 21:31:13] logging.py:144 >> {'loss': 0.2749, 'learning_rate': 1.3875e-05, 'epoch': 1.95, 'throughput': 630.90}
[INFO|2026-04-22 21:37:00] logging.py:144 >> {'loss': 0.2666, 'learning_rate': 1.3797e-05, 'epoch': 1.95, 'throughput': 630.85}
[INFO|2026-04-22 21:42:44] logging.py:144 >> {'loss': 0.2752, 'learning_rate': 1.3720e-05, 'epoch': 1.95, 'throughput': 630.82}
[INFO|2026-04-22 21:47:59] logging.py:144 >> {'loss': 0.2689, 'learning_rate': 1.3643e-05, 'epoch': 1.96, 'throughput': 630.85}
[INFO|2026-04-22 21:53:11] logging.py:144 >> {'loss': 0.2695, 'learning_rate': 1.3565e-05, 'epoch': 1.96, 'throughput': 630.91}
[INFO|2026-04-22 21:58:30] logging.py:144 >> {'loss': 0.2679, 'learning_rate': 1.3488e-05, 'epoch': 1.96, 'throughput': 630.96}
[INFO|2026-04-22 22:03:42] logging.py:144 >> {'loss': 0.2621, 'learning_rate': 1.3411e-05, 'epoch': 1.97, 'throughput': 631.02}
[INFO|2026-04-22 22:09:01] logging.py:144 >> {'loss': 0.2641, 'learning_rate': 1.3335e-05, 'epoch': 1.97, 'throughput': 631.07}
[INFO|2026-04-22 22:14:29] logging.py:144 >> {'loss': 0.2690, 'learning_rate': 1.3258e-05, 'epoch': 1.97, 'throughput': 631.07}
[INFO|2026-04-22 22:19:48] logging.py:144 >> {'loss': 0.2604, 'learning_rate': 1.3182e-05, 'epoch': 1.98, 'throughput': 631.09}
[INFO|2026-04-22 22:25:24] logging.py:144 >> {'loss': 0.2704, 'learning_rate': 1.3105e-05, 'epoch': 1.98, 'throughput': 631.05}
[INFO|2026-04-22 22:31:06] logging.py:144 >> {'loss': 0.2645, 'learning_rate': 1.3029e-05, 'epoch': 1.98, 'throughput': 631.03}
[INFO|2026-04-22 22:36:30] logging.py:144 >> {'loss': 0.2671, 'learning_rate': 1.2953e-05, 'epoch': 1.99, 'throughput': 631.09}
[INFO|2026-04-22 22:41:22] logging.py:144 >> {'loss': 0.2673, 'learning_rate': 1.2877e-05, 'epoch': 1.99, 'throughput': 631.23}
[INFO|2026-04-22 22:46:30] logging.py:144 >> {'loss': 0.2651, 'learning_rate': 1.2801e-05, 'epoch': 1.99, 'throughput': 631.31}
[INFO|2026-04-22 22:51:40] logging.py:144 >> {'loss': 0.2725, 'learning_rate': 1.2726e-05, 'epoch': 2.00, 'throughput': 631.39}
[INFO|2026-04-22 22:57:24] logging.py:144 >> {'loss': 0.2708, 'learning_rate': 1.2650e-05, 'epoch': 2.00, 'throughput': 631.35}
[INFO|2026-04-22 22:58:16] logging.py:144 >> {'loss': 0.2281, 'learning_rate': 1.2575e-05, 'epoch': 2.00, 'throughput': 631.32}
[INFO|2026-04-22 23:03:24] logging.py:144 >> {'loss': 0.1472, 'learning_rate': 1.2500e-05, 'epoch': 2.00, 'throughput': 631.41}
[INFO|2026-04-22 23:08:44] logging.py:144 >> {'loss': 0.1462, 'learning_rate': 1.2425e-05, 'epoch': 2.01, 'throughput': 631.49}
[INFO|2026-04-22 23:14:20] logging.py:144 >> {'loss': 0.1338, 'learning_rate': 1.2350e-05, 'epoch': 2.01, 'throughput': 631.49}
[INFO|2026-04-22 23:19:41] logging.py:144 >> {'loss': 0.1428, 'learning_rate': 1.2275e-05, 'epoch': 2.01, 'throughput': 631.52}
[INFO|2026-04-22 23:24:56] logging.py:144 >> {'loss': 0.1372, 'learning_rate': 1.2201e-05, 'epoch': 2.02, 'throughput': 631.58}
[INFO|2026-04-22 23:30:26] logging.py:144 >> {'loss': 0.1320, 'learning_rate': 1.2127e-05, 'epoch': 2.02, 'throughput': 631.57}
[INFO|2026-04-22 23:36:12] logging.py:144 >> {'loss': 0.1346, 'learning_rate': 1.2052e-05, 'epoch': 2.02, 'throughput': 631.52}
[INFO|2026-04-22 23:41:56] logging.py:144 >> {'loss': 0.1360, 'learning_rate': 1.1978e-05, 'epoch': 2.03, 'throughput': 631.50}
[INFO|2026-04-22 23:47:35] logging.py:144 >> {'loss': 0.1335, 'learning_rate': 1.1904e-05, 'epoch': 2.03, 'throughput': 631.48}
[INFO|2026-04-22 23:52:32] logging.py:144 >> {'loss': 0.1315, 'learning_rate': 1.1831e-05, 'epoch': 2.03, 'throughput': 631.60}
[INFO|2026-04-22 23:57:59] logging.py:144 >> {'loss': 0.1338, 'learning_rate': 1.1757e-05, 'epoch': 2.04, 'throughput': 631.64}
[INFO|2026-04-23 00:03:34] logging.py:144 >> {'loss': 0.1398, 'learning_rate': 1.1683e-05, 'epoch': 2.04, 'throughput': 631.68}
[INFO|2026-04-23 00:08:50] logging.py:144 >> {'loss': 0.1319, 'learning_rate': 1.1610e-05, 'epoch': 2.04, 'throughput': 631.72}
[INFO|2026-04-23 00:14:01] logging.py:144 >> {'loss': 0.1284, 'learning_rate': 1.1537e-05, 'epoch': 2.05, 'throughput': 631.80}
[INFO|2026-04-23 00:19:11] logging.py:144 >> {'loss': 0.1281, 'learning_rate': 1.1464e-05, 'epoch': 2.05, 'throughput': 631.88}
[INFO|2026-04-23 00:24:04] logging.py:144 >> {'loss': 0.1296, 'learning_rate': 1.1391e-05, 'epoch': 2.05, 'throughput': 632.00}
[INFO|2026-04-23 00:28:54] logging.py:144 >> {'loss': 0.1244, 'learning_rate': 1.1319e-05, 'epoch': 2.06, 'throughput': 632.12}
[INFO|2026-04-23 00:33:49] logging.py:144 >> {'loss': 0.1292, 'learning_rate': 1.1246e-05, 'epoch': 2.06, 'throughput': 632.19}
[INFO|2026-04-23 00:38:59] logging.py:144 >> {'loss': 0.1297, 'learning_rate': 1.1174e-05, 'epoch': 2.06, 'throughput': 632.30}
[INFO|2026-04-23 00:44:24] logging.py:144 >> {'loss': 0.1284, 'learning_rate': 1.1102e-05, 'epoch': 2.07, 'throughput': 632.29}
[INFO|2026-04-23 00:49:32] logging.py:144 >> {'loss': 0.1358, 'learning_rate': 1.1030e-05, 'epoch': 2.07, 'throughput': 632.37}
[INFO|2026-04-23 00:54:53] logging.py:144 >> {'loss': 0.1323, 'learning_rate': 1.0958e-05, 'epoch': 2.07, 'throughput': 632.41}
[INFO|2026-04-23 00:59:54] logging.py:144 >> {'loss': 0.1297, 'learning_rate': 1.0886e-05, 'epoch': 2.08, 'throughput': 632.50}
[INFO|2026-04-23 01:04:30] logging.py:144 >> {'loss': 0.1265, 'learning_rate': 1.0815e-05, 'epoch': 2.08, 'throughput': 632.67}
[INFO|2026-04-23 01:09:26] logging.py:144 >> {'loss': 0.1232, 'learning_rate': 1.0744e-05, 'epoch': 2.08, 'throughput': 632.74}
[INFO|2026-04-23 01:14:28] logging.py:144 >> {'loss': 0.1288, 'learning_rate': 1.0672e-05, 'epoch': 2.09, 'throughput': 632.83}
[INFO|2026-04-23 01:19:22] logging.py:144 >> {'loss': 0.1238, 'learning_rate': 1.0601e-05, 'epoch': 2.09, 'throughput': 632.93}
[INFO|2026-04-23 01:24:08] logging.py:144 >> {'loss': 0.1305, 'learning_rate': 1.0531e-05, 'epoch': 2.09, 'throughput': 633.07}
[INFO|2026-04-23 01:28:52] logging.py:144 >> {'loss': 0.1264, 'learning_rate': 1.0460e-05, 'epoch': 2.10, 'throughput': 633.20}
[INFO|2026-04-23 01:33:56] logging.py:144 >> {'loss': 0.1272, 'learning_rate': 1.0390e-05, 'epoch': 2.10, 'throughput': 633.27}
[INFO|2026-04-23 01:38:27] logging.py:144 >> {'loss': 0.1279, 'learning_rate': 1.0319e-05, 'epoch': 2.10, 'throughput': 633.45}
[INFO|2026-04-23 01:43:36] logging.py:144 >> {'loss': 0.1224, 'learning_rate': 1.0249e-05, 'epoch': 2.11, 'throughput': 633.49}
[INFO|2026-04-23 01:48:46] logging.py:144 >> {'loss': 0.1218, 'learning_rate': 1.0179e-05, 'epoch': 2.11, 'throughput': 633.55}
[INFO|2026-04-23 01:54:19] logging.py:144 >> {'loss': 0.1314, 'learning_rate': 1.0110e-05, 'epoch': 2.11, 'throughput': 633.55}
[INFO|2026-04-23 01:59:33] logging.py:144 >> {'loss': 0.1247, 'learning_rate': 1.0040e-05, 'epoch': 2.12, 'throughput': 633.60}
[INFO|2026-04-23 02:04:59] logging.py:144 >> {'loss': 0.1183, 'learning_rate': 9.9708e-06, 'epoch': 2.12, 'throughput': 633.59}
[INFO|2026-04-23 02:10:37] logging.py:144 >> {'loss': 0.1282, 'learning_rate': 9.9016e-06, 'epoch': 2.12, 'throughput': 633.57}
[INFO|2026-04-23 02:16:13] logging.py:144 >> {'loss': 0.1309, 'learning_rate': 9.8326e-06, 'epoch': 2.13, 'throughput': 633.56}
[INFO|2026-04-23 02:21:50] logging.py:144 >> {'loss': 0.1256, 'learning_rate': 9.7638e-06, 'epoch': 2.13, 'throughput': 633.53}
[INFO|2026-04-23 02:27:13] logging.py:144 >> {'loss': 0.1184, 'learning_rate': 9.6951e-06, 'epoch': 2.13, 'throughput': 633.54}
[INFO|2026-04-23 02:32:01] logging.py:144 >> {'loss': 0.1208, 'learning_rate': 9.6267e-06, 'epoch': 2.14, 'throughput': 633.65}
[INFO|2026-04-23 02:37:17] logging.py:144 >> {'loss': 0.1253, 'learning_rate': 9.5584e-06, 'epoch': 2.14, 'throughput': 633.70}
[INFO|2026-04-23 02:42:48] logging.py:144 >> {'loss': 0.1202, 'learning_rate': 9.4903e-06, 'epoch': 2.14, 'throughput': 633.68}
[INFO|2026-04-23 02:48:04] logging.py:144 >> {'loss': 0.1316, 'learning_rate': 9.4224e-06, 'epoch': 2.15, 'throughput': 633.75}
[INFO|2026-04-23 02:53:09] logging.py:144 >> {'loss': 0.1205, 'learning_rate': 9.3547e-06, 'epoch': 2.15, 'throughput': 633.80}
[INFO|2026-04-23 02:58:03] logging.py:144 >> {'loss': 0.1210, 'learning_rate': 9.2872e-06, 'epoch': 2.15, 'throughput': 633.89}
[INFO|2026-04-23 03:02:38] logging.py:144 >> {'loss': 0.1242, 'learning_rate': 9.2199e-06, 'epoch': 2.16, 'throughput': 634.04}
[INFO|2026-04-23 03:07:46] logging.py:144 >> {'loss': 0.1238, 'learning_rate': 9.1527e-06, 'epoch': 2.16, 'throughput': 634.11}
[INFO|2026-04-23 03:12:43] logging.py:144 >> {'loss': 0.1307, 'learning_rate': 9.0858e-06, 'epoch': 2.16, 'throughput': 634.24}
[INFO|2026-04-23 03:17:46] logging.py:144 >> {'loss': 0.1268, 'learning_rate': 9.0190e-06, 'epoch': 2.17, 'throughput': 634.33}
[INFO|2026-04-23 03:22:41] logging.py:144 >> {'loss': 0.1194, 'learning_rate': 8.9525e-06, 'epoch': 2.17, 'throughput': 634.44}
[INFO|2026-04-23 03:27:29] logging.py:144 >> {'loss': 0.1315, 'learning_rate': 8.8861e-06, 'epoch': 2.17, 'throughput': 634.55}
[INFO|2026-04-23 03:32:35] logging.py:144 >> {'loss': 0.1193, 'learning_rate': 8.8199e-06, 'epoch': 2.18, 'throughput': 634.58}
[INFO|2026-04-23 03:37:35] logging.py:144 >> {'loss': 0.1318, 'learning_rate': 8.7539e-06, 'epoch': 2.18, 'throughput': 634.68}
[INFO|2026-04-23 03:43:18] logging.py:144 >> {'loss': 0.1201, 'learning_rate': 8.6881e-06, 'epoch': 2.18, 'throughput': 634.63}
[INFO|2026-04-23 03:49:07] logging.py:144 >> {'loss': 0.1250, 'learning_rate': 8.6225e-06, 'epoch': 2.19, 'throughput': 634.58}
[INFO|2026-04-23 03:54:35] logging.py:144 >> {'loss': 0.1270, 'learning_rate': 8.5571e-06, 'epoch': 2.19, 'throughput': 634.60}
[INFO|2026-04-23 04:00:03] logging.py:144 >> {'loss': 0.1285, 'learning_rate': 8.4919e-06, 'epoch': 2.19, 'throughput': 634.61}
[INFO|2026-04-23 04:04:58] logging.py:144 >> {'loss': 0.1225, 'learning_rate': 8.4269e-06, 'epoch': 2.20, 'throughput': 634.69}
[INFO|2026-04-23 04:10:19] logging.py:144 >> {'loss': 0.1283, 'learning_rate': 8.3621e-06, 'epoch': 2.20, 'throughput': 634.72}
[INFO|2026-04-23 04:15:38] logging.py:144 >> {'loss': 0.1248, 'learning_rate': 8.2975e-06, 'epoch': 2.20, 'throughput': 634.73}
[INFO|2026-04-23 04:21:19] logging.py:144 >> {'loss': 0.1210, 'learning_rate': 8.2331e-06, 'epoch': 2.21, 'throughput': 634.69}
[INFO|2026-04-23 04:27:22] logging.py:144 >> {'loss': 0.1258, 'learning_rate': 8.1689e-06, 'epoch': 2.21, 'throughput': 634.61}
[INFO|2026-04-23 04:33:37] logging.py:144 >> {'loss': 0.1324, 'learning_rate': 8.1049e-06, 'epoch': 2.21, 'throughput': 634.50}
[INFO|2026-04-23 04:39:21] logging.py:144 >> {'loss': 0.1276, 'learning_rate': 8.0411e-06, 'epoch': 2.22, 'throughput': 634.48}
[INFO|2026-04-23 04:45:24] logging.py:144 >> {'loss': 0.1311, 'learning_rate': 7.9775e-06, 'epoch': 2.22, 'throughput': 634.40}
[INFO|2026-04-23 04:51:09] logging.py:144 >> {'loss': 0.1206, 'learning_rate': 7.9141e-06, 'epoch': 2.22, 'throughput': 634.36}
[INFO|2026-04-23 04:56:42] logging.py:144 >> {'loss': 0.1329, 'learning_rate': 7.8510e-06, 'epoch': 2.23, 'throughput': 634.40}
[INFO|2026-04-23 05:01:58] logging.py:144 >> {'loss': 0.1183, 'learning_rate': 7.7880e-06, 'epoch': 2.23, 'throughput': 634.43}
[INFO|2026-04-23 05:07:05] logging.py:144 >> {'loss': 0.1261, 'learning_rate': 7.7252e-06, 'epoch': 2.23, 'throughput': 634.50}
[INFO|2026-04-23 05:12:19] logging.py:144 >> {'loss': 0.1255, 'learning_rate': 7.6627e-06, 'epoch': 2.24, 'throughput': 634.54}
[INFO|2026-04-23 05:17:33] logging.py:144 >> {'loss': 0.1259, 'learning_rate': 7.6003e-06, 'epoch': 2.24, 'throughput': 634.58}
[INFO|2026-04-23 05:22:47] logging.py:144 >> {'loss': 0.1212, 'learning_rate': 7.5382e-06, 'epoch': 2.24, 'throughput': 634.62}
[INFO|2026-04-23 05:27:59] logging.py:144 >> {'loss': 0.1210, 'learning_rate': 7.4762e-06, 'epoch': 2.25, 'throughput': 634.65}
[INFO|2026-04-23 05:33:14] logging.py:144 >> {'loss': 0.1277, 'learning_rate': 7.4145e-06, 'epoch': 2.25, 'throughput': 634.69}
[INFO|2026-04-23 05:38:44] logging.py:144 >> {'loss': 0.1232, 'learning_rate': 7.3530e-06, 'epoch': 2.25, 'throughput': 634.68}
[INFO|2026-04-23 05:44:18] logging.py:144 >> {'loss': 0.1277, 'learning_rate': 7.2917e-06, 'epoch': 2.26, 'throughput': 634.69}
[INFO|2026-04-23 05:49:24] logging.py:144 >> {'loss': 0.1314, 'learning_rate': 7.2306e-06, 'epoch': 2.26, 'throughput': 634.76}
[INFO|2026-04-23 05:55:03] logging.py:144 >> {'loss': 0.1313, 'learning_rate': 7.1698e-06, 'epoch': 2.26, 'throughput': 634.75}
[INFO|2026-04-23 06:00:34] logging.py:144 >> {'loss': 0.1225, 'learning_rate': 7.1091e-06, 'epoch': 2.27, 'throughput': 634.75}
[INFO|2026-04-23 06:05:55] logging.py:144 >> {'loss': 0.1213, 'learning_rate': 7.0487e-06, 'epoch': 2.27, 'throughput': 634.75}
[INFO|2026-04-23 06:11:30] logging.py:144 >> {'loss': 0.1265, 'learning_rate': 6.9884e-06, 'epoch': 2.27, 'throughput': 634.75}
[INFO|2026-04-23 06:16:48] logging.py:144 >> {'loss': 0.1217, 'learning_rate': 6.9284e-06, 'epoch': 2.28, 'throughput': 634.77}
[INFO|2026-04-23 06:22:30] logging.py:144 >> {'loss': 0.1293, 'learning_rate': 6.8686e-06, 'epoch': 2.28, 'throughput': 634.73}
[INFO|2026-04-23 06:27:41] logging.py:144 >> {'loss': 0.1273, 'learning_rate': 6.8091e-06, 'epoch': 2.28, 'throughput': 634.79}
[INFO|2026-04-23 06:33:28] logging.py:144 >> {'loss': 0.1220, 'learning_rate': 6.7497e-06, 'epoch': 2.29, 'throughput': 634.76}
[INFO|2026-04-23 06:39:17] logging.py:144 >> {'loss': 0.1256, 'learning_rate': 6.6906e-06, 'epoch': 2.29, 'throughput': 634.67}
[INFO|2026-04-23 06:44:30] logging.py:144 >> {'loss': 0.1300, 'learning_rate': 6.6316e-06, 'epoch': 2.29, 'throughput': 634.72}
[INFO|2026-04-23 06:50:03] logging.py:144 >> {'loss': 0.1217, 'learning_rate': 6.5729e-06, 'epoch': 2.30, 'throughput': 634.69}
[INFO|2026-04-23 06:55:43] logging.py:144 >> {'loss': 0.1216, 'learning_rate': 6.5145e-06, 'epoch': 2.30, 'throughput': 634.67}
[INFO|2026-04-23 07:01:50] logging.py:144 >> {'loss': 0.1248, 'learning_rate': 6.4562e-06, 'epoch': 2.30, 'throughput': 634.57}
[INFO|2026-04-23 07:07:15] logging.py:144 >> {'loss': 0.1290, 'learning_rate': 6.3982e-06, 'epoch': 2.31, 'throughput': 634.60}
[INFO|2026-04-23 07:12:41] logging.py:144 >> {'loss': 0.1275, 'learning_rate': 6.3404e-06, 'epoch': 2.31, 'throughput': 634.63}
[INFO|2026-04-23 07:17:41] logging.py:144 >> {'loss': 0.1213, 'learning_rate': 6.2828e-06, 'epoch': 2.31, 'throughput': 634.70}
[INFO|2026-04-23 07:22:40] logging.py:144 >> {'loss': 0.1222, 'learning_rate': 6.2255e-06, 'epoch': 2.32, 'throughput': 634.79}
[INFO|2026-04-23 07:28:05] logging.py:144 >> {'loss': 0.1228, 'learning_rate': 6.1683e-06, 'epoch': 2.32, 'throughput': 634.79}
[INFO|2026-04-23 07:33:44] logging.py:144 >> {'loss': 0.1288, 'learning_rate': 6.1114e-06, 'epoch': 2.32, 'throughput': 634.79}
[INFO|2026-04-23 07:39:16] logging.py:144 >> {'loss': 0.1238, 'learning_rate': 6.0547e-06, 'epoch': 2.33, 'throughput': 634.78}
[INFO|2026-04-23 07:44:36] logging.py:144 >> {'loss': 0.1240, 'learning_rate': 5.9983e-06, 'epoch': 2.33, 'throughput': 634.81}
[INFO|2026-04-23 07:49:44] logging.py:144 >> {'loss': 0.1276, 'learning_rate': 5.9421e-06, 'epoch': 2.33, 'throughput': 634.88}
[INFO|2026-04-23 07:54:48] logging.py:144 >> {'loss': 0.1201, 'learning_rate': 5.8861e-06, 'epoch': 2.34, 'throughput': 634.94}
[INFO|2026-04-23 08:00:10] logging.py:144 >> {'loss': 0.1162, 'learning_rate': 5.8303e-06, 'epoch': 2.34, 'throughput': 634.93}
[INFO|2026-04-23 08:05:19] logging.py:144 >> {'loss': 0.1188, 'learning_rate': 5.7748e-06, 'epoch': 2.34, 'throughput': 634.97}
[INFO|2026-04-23 08:10:46] logging.py:144 >> {'loss': 0.1190, 'learning_rate': 5.7195e-06, 'epoch': 2.35, 'throughput': 634.95}
[INFO|2026-04-23 08:16:11] logging.py:144 >> {'loss': 0.1294, 'learning_rate': 5.6644e-06, 'epoch': 2.35, 'throughput': 635.00}
[INFO|2026-04-23 08:21:03] logging.py:144 >> {'loss': 0.1214, 'learning_rate': 5.6096e-06, 'epoch': 2.35, 'throughput': 635.10}
[INFO|2026-04-23 08:26:43] logging.py:144 >> {'loss': 0.1202, 'learning_rate': 5.5550e-06, 'epoch': 2.36, 'throughput': 635.07}
[INFO|2026-04-23 08:32:01] logging.py:144 >> {'loss': 0.1245, 'learning_rate': 5.5006e-06, 'epoch': 2.36, 'throughput': 635.12}
[INFO|2026-04-23 08:37:15] logging.py:144 >> {'loss': 0.1263, 'learning_rate': 5.4465e-06, 'epoch': 2.36, 'throughput': 635.16}
[INFO|2026-04-23 08:41:32] logging.py:144 >> {'loss': 0.1222, 'learning_rate': 5.3926e-06, 'epoch': 2.37, 'throughput': 635.34}
[INFO|2026-04-23 08:46:08] logging.py:144 >> {'loss': 0.1228, 'learning_rate': 5.3389e-06, 'epoch': 2.37, 'throughput': 635.50}
[INFO|2026-04-23 08:50:35] logging.py:144 >> {'loss': 0.1226, 'learning_rate': 5.2855e-06, 'epoch': 2.37, 'throughput': 635.67}
[INFO|2026-04-23 08:55:30] logging.py:144 >> {'loss': 0.1203, 'learning_rate': 5.2323e-06, 'epoch': 2.38, 'throughput': 635.77}
[INFO|2026-04-23 09:00:01] logging.py:144 >> {'loss': 0.1221, 'learning_rate': 5.1794e-06, 'epoch': 2.38, 'throughput': 635.93}
[INFO|2026-04-23 09:04:48] logging.py:144 >> {'loss': 0.1252, 'learning_rate': 5.1267e-06, 'epoch': 2.38, 'throughput': 636.06}
[INFO|2026-04-23 09:09:34] logging.py:144 >> {'loss': 0.1276, 'learning_rate': 5.0742e-06, 'epoch': 2.39, 'throughput': 636.19}
[INFO|2026-04-23 09:14:00] logging.py:144 >> {'loss': 0.1192, 'learning_rate': 5.0219e-06, 'epoch': 2.39, 'throughput': 636.36}
[INFO|2026-04-23 09:18:21] logging.py:144 >> {'loss': 0.1266, 'learning_rate': 4.9700e-06, 'epoch': 2.39, 'throughput': 636.55}
[INFO|2026-04-23 09:22:56] logging.py:144 >> {'loss': 0.1191, 'learning_rate': 4.9182e-06, 'epoch': 2.40, 'throughput': 636.71}
[INFO|2026-04-23 09:27:59] logging.py:144 >> {'loss': 0.1216, 'learning_rate': 4.8667e-06, 'epoch': 2.40, 'throughput': 636.78}
[INFO|2026-04-23 09:32:36] logging.py:144 >> {'loss': 0.1246, 'learning_rate': 4.8154e-06, 'epoch': 2.40, 'throughput': 636.94}
[INFO|2026-04-23 09:37:17] logging.py:144 >> {'loss': 0.1196, 'learning_rate': 4.7644e-06, 'epoch': 2.41, 'throughput': 637.07}
[INFO|2026-04-23 09:41:55] logging.py:144 >> {'loss': 0.1250, 'learning_rate': 4.7136e-06, 'epoch': 2.41, 'throughput': 637.20}
[INFO|2026-04-23 09:47:03] logging.py:144 >> {'loss': 0.1268, 'learning_rate': 4.6631e-06, 'epoch': 2.41, 'throughput': 637.27}
[INFO|2026-04-23 09:52:20] logging.py:144 >> {'loss': 0.1201, 'learning_rate': 4.6128e-06, 'epoch': 2.42, 'throughput': 637.29}
[INFO|2026-04-23 09:57:03] logging.py:144 >> {'loss': 0.1194, 'learning_rate': 4.5627e-06, 'epoch': 2.42, 'throughput': 637.40}
[INFO|2026-04-23 10:01:31] logging.py:144 >> {'loss': 0.1227, 'learning_rate': 4.5129e-06, 'epoch': 2.42, 'throughput': 637.56}
[INFO|2026-04-23 10:06:17] logging.py:144 >> {'loss': 0.1236, 'learning_rate': 4.4634e-06, 'epoch': 2.43, 'throughput': 637.67}
[INFO|2026-04-23 10:10:40] logging.py:144 >> {'loss': 0.1275, 'learning_rate': 4.4140e-06, 'epoch': 2.43, 'throughput': 637.86}
[INFO|2026-04-23 10:15:13] logging.py:144 >> {'loss': 0.1131, 'learning_rate': 4.3650e-06, 'epoch': 2.43, 'throughput': 637.98}
[INFO|2026-04-23 10:19:31] logging.py:144 >> {'loss': 0.1212, 'learning_rate': 4.3162e-06, 'epoch': 2.44, 'throughput': 638.16}
[INFO|2026-04-23 10:24:16] logging.py:144 >> {'loss': 0.1304, 'learning_rate': 4.2676e-06, 'epoch': 2.44, 'throughput': 638.31}
[INFO|2026-04-23 10:28:47] logging.py:144 >> {'loss': 0.1255, 'learning_rate': 4.2193e-06, 'epoch': 2.44, 'throughput': 638.46}
[INFO|2026-04-23 10:33:22] logging.py:144 >> {'loss': 0.1220, 'learning_rate': 4.1712e-06, 'epoch': 2.44, 'throughput': 638.60}
[INFO|2026-04-23 10:38:01] logging.py:144 >> {'loss': 0.1284, 'learning_rate': 4.1234e-06, 'epoch': 2.45, 'throughput': 638.76}
[INFO|2026-04-23 10:42:36] logging.py:144 >> {'loss': 0.1192, 'learning_rate': 4.0758e-06, 'epoch': 2.45, 'throughput': 638.89}
[INFO|2026-04-23 10:46:58] logging.py:144 >> {'loss': 0.1171, 'learning_rate': 4.0285e-06, 'epoch': 2.45, 'throughput': 639.06}
[INFO|2026-04-23 10:51:42] logging.py:144 >> {'loss': 0.1259, 'learning_rate': 3.9815e-06, 'epoch': 2.46, 'throughput': 639.20}
[INFO|2026-04-23 10:55:58] logging.py:144 >> {'loss': 0.1252, 'learning_rate': 3.9346e-06, 'epoch': 2.46, 'throughput': 639.40}
[INFO|2026-04-23 11:00:02] logging.py:144 >> {'loss': 0.1179, 'learning_rate': 3.8881e-06, 'epoch': 2.46, 'throughput': 639.61}
[INFO|2026-04-23 11:04:06] logging.py:144 >> {'loss': 0.1319, 'learning_rate': 3.8418e-06, 'epoch': 2.47, 'throughput': 639.85}
[INFO|2026-04-23 11:08:35] logging.py:144 >> {'loss': 0.1244, 'learning_rate': 3.7957e-06, 'epoch': 2.47, 'throughput': 640.01}
[INFO|2026-04-23 11:12:49] logging.py:144 >> {'loss': 0.1186, 'learning_rate': 3.7499e-06, 'epoch': 2.47, 'throughput': 640.18}
[INFO|2026-04-23 11:17:06] logging.py:144 >> {'loss': 0.1219, 'learning_rate': 3.7044e-06, 'epoch': 2.48, 'throughput': 640.36}
[INFO|2026-04-23 11:21:32] logging.py:144 >> {'loss': 0.1262, 'learning_rate': 3.6591e-06, 'epoch': 2.48, 'throughput': 640.55}
[INFO|2026-04-23 11:25:59] logging.py:144 >> {'loss': 0.1294, 'learning_rate': 3.6141e-06, 'epoch': 2.48, 'throughput': 640.73}
[INFO|2026-04-23 11:30:35] logging.py:144 >> {'loss': 0.1221, 'learning_rate': 3.5693e-06, 'epoch': 2.49, 'throughput': 640.87}
[INFO|2026-04-23 11:35:11] logging.py:144 >> {'loss': 0.1242, 'learning_rate': 3.5248e-06, 'epoch': 2.49, 'throughput': 641.02}
[INFO|2026-04-23 11:39:32] logging.py:144 >> {'loss': 0.1243, 'learning_rate': 3.4806e-06, 'epoch': 2.49, 'throughput': 641.20}
[INFO|2026-04-23 11:43:50] logging.py:144 >> {'loss': 0.1202, 'learning_rate': 3.4366e-06, 'epoch': 2.50, 'throughput': 641.38}
[INFO|2026-04-23 11:48:15] logging.py:144 >> {'loss': 0.1189, 'learning_rate': 3.3928e-06, 'epoch': 2.50, 'throughput': 641.55}
[INFO|2026-04-23 11:53:08] logging.py:144 >> {'loss': 0.1161, 'learning_rate': 3.3494e-06, 'epoch': 2.50, 'throughput': 641.63}
[INFO|2026-04-23 11:57:45] logging.py:144 >> {'loss': 0.1294, 'learning_rate': 3.3062e-06, 'epoch': 2.51, 'throughput': 641.78}
[INFO|2026-04-23 12:01:57] logging.py:144 >> {'loss': 0.1206, 'learning_rate': 3.2632e-06, 'epoch': 2.51, 'throughput': 641.99}
[INFO|2026-04-23 12:05:58] logging.py:144 >> {'loss': 0.1193, 'learning_rate': 3.2205e-06, 'epoch': 2.51, 'throughput': 642.22}
[INFO|2026-04-23 12:10:37] logging.py:144 >> {'loss': 0.1243, 'learning_rate': 3.1781e-06, 'epoch': 2.52, 'throughput': 642.36}
[INFO|2026-04-23 12:15:19] logging.py:144 >> {'loss': 0.1233, 'learning_rate': 3.1359e-06, 'epoch': 2.52, 'throughput': 642.49}
[INFO|2026-04-23 12:20:17] logging.py:144 >> {'loss': 0.1181, 'learning_rate': 3.0940e-06, 'epoch': 2.52, 'throughput': 642.54}
[INFO|2026-04-23 12:25:20] logging.py:144 >> {'loss': 0.1210, 'learning_rate': 3.0524e-06, 'epoch': 2.53, 'throughput': 642.61}
[INFO|2026-04-23 12:30:12] logging.py:144 >> {'loss': 0.1214, 'learning_rate': 3.0110e-06, 'epoch': 2.53, 'throughput': 642.70}
[INFO|2026-04-23 12:35:18] logging.py:144 >> {'loss': 0.1194, 'learning_rate': 2.9699e-06, 'epoch': 2.53, 'throughput': 642.76}
[INFO|2026-04-23 12:40:45] logging.py:144 >> {'loss': 0.1205, 'learning_rate': 2.9290e-06, 'epoch': 2.54, 'throughput': 642.77}
[INFO|2026-04-23 12:45:50] logging.py:144 >> {'loss': 0.1203, 'learning_rate': 2.8884e-06, 'epoch': 2.54, 'throughput': 642.81}
[INFO|2026-04-23 12:50:44] logging.py:144 >> {'loss': 0.1163, 'learning_rate': 2.8481e-06, 'epoch': 2.54, 'throughput': 642.88}
[INFO|2026-04-23 12:55:43] logging.py:144 >> {'loss': 0.1240, 'learning_rate': 2.8081e-06, 'epoch': 2.55, 'throughput': 642.96}
[INFO|2026-04-23 13:01:13] logging.py:144 >> {'loss': 0.1299, 'learning_rate': 2.7683e-06, 'epoch': 2.55, 'throughput': 642.94}
[INFO|2026-04-23 13:06:40] logging.py:144 >> {'loss': 0.1229, 'learning_rate': 2.7288e-06, 'epoch': 2.55, 'throughput': 642.95}
[INFO|2026-04-23 13:11:55] logging.py:144 >> {'loss': 0.1227, 'learning_rate': 2.6895e-06, 'epoch': 2.56, 'throughput': 642.97}
[INFO|2026-04-23 13:16:58] logging.py:144 >> {'loss': 0.1168, 'learning_rate': 2.6505e-06, 'epoch': 2.56, 'throughput': 643.00}
[INFO|2026-04-23 13:22:15] logging.py:144 >> {'loss': 0.1220, 'learning_rate': 2.6118e-06, 'epoch': 2.56, 'throughput': 643.03}
[INFO|2026-04-23 13:27:25] logging.py:144 >> {'loss': 0.1209, 'learning_rate': 2.5734e-06, 'epoch': 2.57, 'throughput': 643.07}
[INFO|2026-04-23 13:33:08] logging.py:144 >> {'loss': 0.1252, 'learning_rate': 2.5352e-06, 'epoch': 2.57, 'throughput': 643.04}
[INFO|2026-04-23 13:38:14] logging.py:144 >> {'loss': 0.1299, 'learning_rate': 2.4973e-06, 'epoch': 2.57, 'throughput': 643.10}
[INFO|2026-04-23 13:43:35] logging.py:144 >> {'loss': 0.1194, 'learning_rate': 2.4597e-06, 'epoch': 2.58, 'throughput': 643.13}
[INFO|2026-04-23 13:48:23] logging.py:144 >> {'loss': 0.1272, 'learning_rate': 2.4223e-06, 'epoch': 2.58, 'throughput': 643.23}
[INFO|2026-04-23 13:53:38] logging.py:144 >> {'loss': 0.1254, 'learning_rate': 2.3852e-06, 'epoch': 2.58, 'throughput': 643.26}
[INFO|2026-04-23 13:59:01] logging.py:144 >> {'loss': 0.1231, 'learning_rate': 2.3484e-06, 'epoch': 2.59, 'throughput': 643.29}
[INFO|2026-04-23 14:04:10] logging.py:144 >> {'loss': 0.1157, 'learning_rate': 2.3119e-06, 'epoch': 2.59, 'throughput': 643.32}
[INFO|2026-04-23 14:09:20] logging.py:144 >> {'loss': 0.1284, 'learning_rate': 2.2756e-06, 'epoch': 2.59, 'throughput': 643.38}
[INFO|2026-04-23 14:14:11] logging.py:144 >> {'loss': 0.1237, 'learning_rate': 2.2396e-06, 'epoch': 2.60, 'throughput': 643.47}
[INFO|2026-04-23 14:19:25] logging.py:144 >> {'loss': 0.1140, 'learning_rate': 2.2039e-06, 'epoch': 2.60, 'throughput': 643.48}
[INFO|2026-04-23 14:25:06] logging.py:144 >> {'loss': 0.1278, 'learning_rate': 2.1684e-06, 'epoch': 2.60, 'throughput': 643.46}
[INFO|2026-04-23 14:30:32] logging.py:144 >> {'loss': 0.1150, 'learning_rate': 2.1332e-06, 'epoch': 2.61, 'throughput': 643.44}
[INFO|2026-04-23 14:35:51] logging.py:144 >> {'loss': 0.1224, 'learning_rate': 2.0983e-06, 'epoch': 2.61, 'throughput': 643.47}
[INFO|2026-04-23 14:41:00] logging.py:144 >> {'loss': 0.1272, 'learning_rate': 2.0637e-06, 'epoch': 2.61, 'throughput': 643.54}
[INFO|2026-04-23 14:46:30] logging.py:144 >> {'loss': 0.1173, 'learning_rate': 2.0294e-06, 'epoch': 2.62, 'throughput': 643.52}
[INFO|2026-04-23 14:51:55] logging.py:144 >> {'loss': 0.1257, 'learning_rate': 1.9953e-06, 'epoch': 2.62, 'throughput': 643.51}
[INFO|2026-04-23 14:57:08] logging.py:144 >> {'loss': 0.1230, 'learning_rate': 1.9615e-06, 'epoch': 2.62, 'throughput': 643.53}
[INFO|2026-04-23 15:02:21] logging.py:144 >> {'loss': 0.1166, 'learning_rate': 1.9280e-06, 'epoch': 2.63, 'throughput': 643.56}
[INFO|2026-04-23 15:07:58] logging.py:144 >> {'loss': 0.1183, 'learning_rate': 1.8947e-06, 'epoch': 2.63, 'throughput': 643.55}
[INFO|2026-04-23 15:13:37] logging.py:144 >> {'loss': 0.1233, 'learning_rate': 1.8618e-06, 'epoch': 2.63, 'throughput': 643.52}
[INFO|2026-04-23 15:19:30] logging.py:144 >> {'loss': 0.1211, 'learning_rate': 1.8291e-06, 'epoch': 2.64, 'throughput': 643.46}
[INFO|2026-04-23 15:25:16] logging.py:144 >> {'loss': 0.1240, 'learning_rate': 1.7967e-06, 'epoch': 2.64, 'throughput': 643.42}
[INFO|2026-04-23 15:30:42] logging.py:144 >> {'loss': 0.1199, 'learning_rate': 1.7645e-06, 'epoch': 2.64, 'throughput': 643.42}
[INFO|2026-04-23 15:36:10] logging.py:144 >> {'loss': 0.1234, 'learning_rate': 1.7327e-06, 'epoch': 2.65, 'throughput': 643.42}
[INFO|2026-04-23 15:42:12] logging.py:144 >> {'loss': 0.1233, 'learning_rate': 1.7011e-06, 'epoch': 2.65, 'throughput': 643.33}
[INFO|2026-04-23 15:47:46] logging.py:144 >> {'loss': 0.1225, 'learning_rate': 1.6698e-06, 'epoch': 2.65, 'throughput': 643.32}
[INFO|2026-04-23 15:53:21] logging.py:144 >> {'loss': 0.1269, 'learning_rate': 1.6388e-06, 'epoch': 2.66, 'throughput': 643.29}
[INFO|2026-04-23 15:58:56] logging.py:144 >> {'loss': 0.1238, 'learning_rate': 1.6081e-06, 'epoch': 2.66, 'throughput': 643.28}
[INFO|2026-04-23 16:04:38] logging.py:144 >> {'loss': 0.1254, 'learning_rate': 1.5776e-06, 'epoch': 2.66, 'throughput': 643.26}
[INFO|2026-04-23 16:09:43] logging.py:144 >> {'loss': 0.1193, 'learning_rate': 1.5475e-06, 'epoch': 2.67, 'throughput': 643.29}
[INFO|2026-04-23 16:15:09] logging.py:144 >> {'loss': 0.1167, 'learning_rate': 1.5176e-06, 'epoch': 2.67, 'throughput': 643.26}
[INFO|2026-04-23 16:20:38] logging.py:144 >> {'loss': 0.1148, 'learning_rate': 1.4880e-06, 'epoch': 2.67, 'throughput': 643.24}
[INFO|2026-04-23 16:25:53] logging.py:144 >> {'loss': 0.1212, 'learning_rate': 1.4587e-06, 'epoch': 2.68, 'throughput': 643.27}
[INFO|2026-04-23 16:31:05] logging.py:144 >> {'loss': 0.1189, 'learning_rate': 1.4296e-06, 'epoch': 2.68, 'throughput': 643.31}
[INFO|2026-04-23 16:36:46] logging.py:144 >> {'loss': 0.1302, 'learning_rate': 1.4009e-06, 'epoch': 2.68, 'throughput': 643.29}
[INFO|2026-04-23 16:41:49] logging.py:144 >> {'loss': 0.1180, 'learning_rate': 1.3724e-06, 'epoch': 2.69, 'throughput': 643.33}
[INFO|2026-04-23 16:46:42] logging.py:144 >> {'loss': 0.1180, 'learning_rate': 1.3442e-06, 'epoch': 2.69, 'throughput': 643.41}
[INFO|2026-04-23 16:52:04] logging.py:144 >> {'loss': 0.1163, 'learning_rate': 1.3163e-06, 'epoch': 2.69, 'throughput': 643.39}
[INFO|2026-04-23 16:57:25] logging.py:144 >> {'loss': 0.1214, 'learning_rate': 1.2887e-06, 'epoch': 2.70, 'throughput': 643.41}
[INFO|2026-04-23 17:03:32] logging.py:144 >> {'loss': 0.1224, 'learning_rate': 1.2614e-06, 'epoch': 2.70, 'throughput': 643.33}
[INFO|2026-04-23 17:09:16] logging.py:144 >> {'loss': 0.1196, 'learning_rate': 1.2343e-06, 'epoch': 2.70, 'throughput': 643.31}
[INFO|2026-04-23 17:15:12] logging.py:144 >> {'loss': 0.1207, 'learning_rate': 1.2076e-06, 'epoch': 2.71, 'throughput': 643.24}
[INFO|2026-04-23 17:20:31] logging.py:144 >> {'loss': 0.1258, 'learning_rate': 1.1811e-06, 'epoch': 2.71, 'throughput': 643.27}
[INFO|2026-04-23 17:25:15] logging.py:144 >> {'loss': 0.1142, 'learning_rate': 1.1549e-06, 'epoch': 2.71, 'throughput': 643.37}
[INFO|2026-04-23 17:30:42] logging.py:144 >> {'loss': 0.1319, 'learning_rate': 1.1290e-06, 'epoch': 2.72, 'throughput': 643.39}
[INFO|2026-04-23 17:36:00] logging.py:144 >> {'loss': 0.1202, 'learning_rate': 1.1034e-06, 'epoch': 2.72, 'throughput': 643.43}
[INFO|2026-04-23 17:42:05] logging.py:144 >> {'loss': 0.1211, 'learning_rate': 1.0781e-06, 'epoch': 2.72, 'throughput': 643.34}
[INFO|2026-04-23 17:47:22] logging.py:144 >> {'loss': 0.1165, 'learning_rate': 1.0530e-06, 'epoch': 2.73, 'throughput': 643.36}
[INFO|2026-04-23 17:52:56] logging.py:144 >> {'loss': 0.1209, 'learning_rate': 1.0283e-06, 'epoch': 2.73, 'throughput': 643.33}
[INFO|2026-04-23 17:58:36] logging.py:144 >> {'loss': 0.1263, 'learning_rate': 1.0038e-06, 'epoch': 2.73, 'throughput': 643.30}
[INFO|2026-04-23 18:04:16] logging.py:144 >> {'loss': 0.1239, 'learning_rate': 9.7964e-07, 'epoch': 2.74, 'throughput': 643.27}
[INFO|2026-04-23 18:09:49] logging.py:144 >> {'loss': 0.1212, 'learning_rate': 9.5575e-07, 'epoch': 2.74, 'throughput': 643.26}
[INFO|2026-04-23 18:16:27] logging.py:144 >> {'loss': 0.1252, 'learning_rate': 9.3216e-07, 'epoch': 2.74, 'throughput': 643.11}
[INFO|2026-04-23 18:22:18] logging.py:144 >> {'loss': 0.1191, 'learning_rate': 9.0885e-07, 'epoch': 2.75, 'throughput': 643.05}
[INFO|2026-04-23 18:27:45] logging.py:144 >> {'loss': 0.1275, 'learning_rate': 8.8584e-07, 'epoch': 2.75, 'throughput': 643.05}
[INFO|2026-04-23 18:33:30] logging.py:144 >> {'loss': 0.1219, 'learning_rate': 8.6311e-07, 'epoch': 2.75, 'throughput': 643.01}
[INFO|2026-04-23 18:39:08] logging.py:144 >> {'loss': 0.1162, 'learning_rate': 8.4067e-07, 'epoch': 2.76, 'throughput': 642.97}
[INFO|2026-04-23 18:45:12] logging.py:144 >> {'loss': 0.1268, 'learning_rate': 8.1853e-07, 'epoch': 2.76, 'throughput': 642.89}
[INFO|2026-04-23 18:50:35] logging.py:144 >> {'loss': 0.1190, 'learning_rate': 7.9667e-07, 'epoch': 2.76, 'throughput': 642.90}
[INFO|2026-04-23 18:55:46] logging.py:144 >> {'loss': 0.1130, 'learning_rate': 7.7511e-07, 'epoch': 2.77, 'throughput': 642.91}
[INFO|2026-04-23 19:01:48] logging.py:144 >> {'loss': 0.1192, 'learning_rate': 7.5383e-07, 'epoch': 2.77, 'throughput': 642.84}
[INFO|2026-04-23 19:07:36] logging.py:144 >> {'loss': 0.1230, 'learning_rate': 7.3285e-07, 'epoch': 2.77, 'throughput': 642.80}
[INFO|2026-04-23 19:13:16] logging.py:144 >> {'loss': 0.1195, 'learning_rate': 7.1216e-07, 'epoch': 2.78, 'throughput': 642.76}
[INFO|2026-04-23 19:19:13] logging.py:144 >> {'loss': 0.1208, 'learning_rate': 6.9176e-07, 'epoch': 2.78, 'throughput': 642.68}
[INFO|2026-04-23 19:24:27] logging.py:144 >> {'loss': 0.1225, 'learning_rate': 6.7166e-07, 'epoch': 2.78, 'throughput': 642.71}
[INFO|2026-04-23 19:30:13] logging.py:144 >> {'loss': 0.1170, 'learning_rate': 6.5185e-07, 'epoch': 2.79, 'throughput': 642.66}
[INFO|2026-04-23 19:36:01] logging.py:144 >> {'loss': 0.1279, 'learning_rate': 6.3233e-07, 'epoch': 2.79, 'throughput': 642.64}
[INFO|2026-04-23 19:41:48] logging.py:144 >> {'loss': 0.1145, 'learning_rate': 6.1310e-07, 'epoch': 2.79, 'throughput': 642.59}
[INFO|2026-04-23 19:47:24] logging.py:144 >> {'loss': 0.1231, 'learning_rate': 5.9416e-07, 'epoch': 2.80, 'throughput': 642.56}
[INFO|2026-04-23 19:52:58] logging.py:144 >> {'loss': 0.1189, 'learning_rate': 5.7552e-07, 'epoch': 2.80, 'throughput': 642.54}
[INFO|2026-04-23 19:58:53] logging.py:144 >> {'loss': 0.1174, 'learning_rate': 5.5718e-07, 'epoch': 2.80, 'throughput': 642.47}
[INFO|2026-04-23 20:04:47] logging.py:144 >> {'loss': 0.1246, 'learning_rate': 5.3912e-07, 'epoch': 2.81, 'throughput': 642.41}
[INFO|2026-04-23 20:10:29] logging.py:144 >> {'loss': 0.1204, 'learning_rate': 5.2137e-07, 'epoch': 2.81, 'throughput': 642.38}
[INFO|2026-04-23 20:16:03] logging.py:144 >> {'loss': 0.1130, 'learning_rate': 5.0390e-07, 'epoch': 2.81, 'throughput': 642.34}
[INFO|2026-04-23 20:21:41] logging.py:144 >> {'loss': 0.1139, 'learning_rate': 4.8673e-07, 'epoch': 2.82, 'throughput': 642.28}
[INFO|2026-04-23 20:26:51] logging.py:144 >> {'loss': 0.1255, 'learning_rate': 4.6986e-07, 'epoch': 2.82, 'throughput': 642.31}
[INFO|2026-04-23 20:32:53] logging.py:144 >> {'loss': 0.1258, 'learning_rate': 4.5328e-07, 'epoch': 2.82, 'throughput': 642.25}
[INFO|2026-04-23 20:38:28] logging.py:144 >> {'loss': 0.1148, 'learning_rate': 4.3699e-07, 'epoch': 2.83, 'throughput': 642.23}
[INFO|2026-04-23 20:43:29] logging.py:144 >> {'loss': 0.1207, 'learning_rate': 4.2100e-07, 'epoch': 2.83, 'throughput': 642.28}
[INFO|2026-04-23 20:48:23] logging.py:144 >> {'loss': 0.1181, 'learning_rate': 4.0531e-07, 'epoch': 2.83, 'throughput': 642.34}
[INFO|2026-04-23 20:53:44] logging.py:144 >> {'loss': 0.1212, 'learning_rate': 3.8991e-07, 'epoch': 2.84, 'throughput': 642.35}
[INFO|2026-04-23 20:59:32] logging.py:144 >> {'loss': 0.1177, 'learning_rate': 3.7480e-07, 'epoch': 2.84, 'throughput': 642.29}
[INFO|2026-04-23 21:04:52] logging.py:144 >> {'loss': 0.1125, 'learning_rate': 3.6000e-07, 'epoch': 2.84, 'throughput': 642.30}
[INFO|2026-04-23 21:10:01] logging.py:144 >> {'loss': 0.1267, 'learning_rate': 3.4549e-07, 'epoch': 2.85, 'throughput': 642.35}
[INFO|2026-04-23 21:15:11] logging.py:144 >> {'loss': 0.1176, 'learning_rate': 3.3127e-07, 'epoch': 2.85, 'throughput': 642.39}
[INFO|2026-04-23 21:20:24] logging.py:144 >> {'loss': 0.1165, 'learning_rate': 3.1736e-07, 'epoch': 2.85, 'throughput': 642.41}
[INFO|2026-04-23 21:25:18] logging.py:144 >> {'loss': 0.1163, 'learning_rate': 3.0374e-07, 'epoch': 2.86, 'throughput': 642.48}
[INFO|2026-04-23 21:30:30] logging.py:144 >> {'loss': 0.1253, 'learning_rate': 2.9041e-07, 'epoch': 2.86, 'throughput': 642.53}
[INFO|2026-04-23 21:35:33] logging.py:144 >> {'loss': 0.1232, 'learning_rate': 2.7739e-07, 'epoch': 2.86, 'throughput': 642.57}
[INFO|2026-04-23 21:40:47] logging.py:144 >> {'loss': 0.1290, 'learning_rate': 2.6466e-07, 'epoch': 2.87, 'throughput': 642.61}
[INFO|2026-04-23 21:46:17] logging.py:144 >> {'loss': 0.1139, 'learning_rate': 2.5223e-07, 'epoch': 2.87, 'throughput': 642.60}
[INFO|2026-04-23 21:51:42] logging.py:144 >> {'loss': 0.1189, 'learning_rate': 2.4009e-07, 'epoch': 2.87, 'throughput': 642.61}
[INFO|2026-04-23 21:57:19] logging.py:144 >> {'loss': 0.1202, 'learning_rate': 2.2826e-07, 'epoch': 2.88, 'throughput': 642.57}
[INFO|2026-04-23 22:02:41] logging.py:144 >> {'loss': 0.1205, 'learning_rate': 2.1672e-07, 'epoch': 2.88, 'throughput': 642.58}
[INFO|2026-04-23 22:07:29] logging.py:144 >> {'loss': 0.1153, 'learning_rate': 2.0548e-07, 'epoch': 2.88, 'throughput': 642.66}
[INFO|2026-04-23 22:12:16] logging.py:144 >> {'loss': 0.1232, 'learning_rate': 1.9453e-07, 'epoch': 2.89, 'throughput': 642.76}
[INFO|2026-04-23 22:17:31] logging.py:144 >> {'loss': 0.1213, 'learning_rate': 1.8389e-07, 'epoch': 2.89, 'throughput': 642.78}
[INFO|2026-04-23 22:22:25] logging.py:144 >> {'loss': 0.1284, 'learning_rate': 1.7354e-07, 'epoch': 2.89, 'throughput': 642.86}
[INFO|2026-04-23 22:27:21] logging.py:144 >> {'loss': 0.1237, 'learning_rate': 1.6350e-07, 'epoch': 2.90, 'throughput': 642.93}
[INFO|2026-04-23 22:32:32] logging.py:144 >> {'loss': 0.1223, 'learning_rate': 1.5375e-07, 'epoch': 2.90, 'throughput': 642.99}
[INFO|2026-04-23 22:38:05] logging.py:144 >> {'loss': 0.1159, 'learning_rate': 1.4430e-07, 'epoch': 2.90, 'throughput': 642.99}
[INFO|2026-04-23 22:43:09] logging.py:144 >> {'loss': 0.1243, 'learning_rate': 1.3515e-07, 'epoch': 2.91, 'throughput': 643.05}
[INFO|2026-04-23 22:47:43] logging.py:144 >> {'loss': 0.1170, 'learning_rate': 1.2629e-07, 'epoch': 2.91, 'throughput': 643.16}
[INFO|2026-04-23 22:53:04] logging.py:144 >> {'loss': 0.1153, 'learning_rate': 1.1774e-07, 'epoch': 2.91, 'throughput': 643.15}
[INFO|2026-04-23 22:58:34] logging.py:144 >> {'loss': 0.1225, 'learning_rate': 1.0949e-07, 'epoch': 2.92, 'throughput': 643.14}
[INFO|2026-04-23 23:03:33] logging.py:144 >> {'loss': 0.1235, 'learning_rate': 1.0153e-07, 'epoch': 2.92, 'throughput': 643.20}
[INFO|2026-04-23 23:08:47] logging.py:144 >> {'loss': 0.1221, 'learning_rate': 9.3877e-08, 'epoch': 2.92, 'throughput': 643.22}
[INFO|2026-04-23 23:14:10] logging.py:144 >> {'loss': 0.1196, 'learning_rate': 8.6522e-08, 'epoch': 2.93, 'throughput': 643.23}
[INFO|2026-04-23 23:19:25] logging.py:144 >> {'loss': 0.1166, 'learning_rate': 7.9466e-08, 'epoch': 2.93, 'throughput': 643.27}
[INFO|2026-04-23 23:24:16] logging.py:144 >> {'loss': 0.1154, 'learning_rate': 7.2709e-08, 'epoch': 2.93, 'throughput': 643.31}
[INFO|2026-04-23 23:29:03] logging.py:144 >> {'loss': 0.1184, 'learning_rate': 6.6252e-08, 'epoch': 2.94, 'throughput': 643.39}
[INFO|2026-04-23 23:34:17] logging.py:144 >> {'loss': 0.1156, 'learning_rate': 6.0095e-08, 'epoch': 2.94, 'throughput': 643.40}
[INFO|2026-04-23 23:39:46] logging.py:144 >> {'loss': 0.1173, 'learning_rate': 5.4238e-08, 'epoch': 2.94, 'throughput': 643.39}
[INFO|2026-04-23 23:44:34] logging.py:144 >> {'loss': 0.1157, 'learning_rate': 4.8681e-08, 'epoch': 2.95, 'throughput': 643.45}
[INFO|2026-04-23 23:49:46] logging.py:144 >> {'loss': 0.1246, 'learning_rate': 4.3424e-08, 'epoch': 2.95, 'throughput': 643.50}
[INFO|2026-04-23 23:54:49] logging.py:144 >> {'loss': 0.1303, 'learning_rate': 3.8466e-08, 'epoch': 2.95, 'throughput': 643.59}
[INFO|2026-04-23 23:59:41] logging.py:144 >> {'loss': 0.1234, 'learning_rate': 3.3809e-08, 'epoch': 2.96, 'throughput': 643.67}
[INFO|2026-04-24 00:04:51] logging.py:144 >> {'loss': 0.1132, 'learning_rate': 2.9453e-08, 'epoch': 2.96, 'throughput': 643.69}
[INFO|2026-04-24 00:10:00] logging.py:144 >> {'loss': 0.1146, 'learning_rate': 2.5396e-08, 'epoch': 2.96, 'throughput': 643.71}
[INFO|2026-04-24 00:15:16] logging.py:144 >> {'loss': 0.1211, 'learning_rate': 2.1640e-08, 'epoch': 2.97, 'throughput': 643.73}
[INFO|2026-04-24 00:20:45] logging.py:144 >> {'loss': 0.1202, 'learning_rate': 1.8184e-08, 'epoch': 2.97, 'throughput': 643.73}
[INFO|2026-04-24 00:26:15] logging.py:144 >> {'loss': 0.1187, 'learning_rate': 1.5028e-08, 'epoch': 2.97, 'throughput': 643.72}
[INFO|2026-04-24 00:31:58] logging.py:144 >> {'loss': 0.1216, 'learning_rate': 1.2173e-08, 'epoch': 2.98, 'throughput': 643.68}
[INFO|2026-04-24 00:37:35] logging.py:144 >> {'loss': 0.1163, 'learning_rate': 9.6185e-09, 'epoch': 2.98, 'throughput': 643.65}
[INFO|2026-04-24 00:43:06] logging.py:144 >> {'loss': 0.1179, 'learning_rate': 7.3642e-09, 'epoch': 2.98, 'throughput': 643.64}
[INFO|2026-04-24 00:48:19] logging.py:144 >> {'loss': 0.1213, 'learning_rate': 5.4105e-09, 'epoch': 2.99, 'throughput': 643.67}
[INFO|2026-04-24 00:53:46] logging.py:144 >> {'loss': 0.1221, 'learning_rate': 3.7574e-09, 'epoch': 2.99, 'throughput': 643.67}
[INFO|2026-04-24 00:59:37] logging.py:144 >> {'loss': 0.1164, 'learning_rate': 2.4047e-09, 'epoch': 2.99, 'throughput': 643.60}
[INFO|2026-04-24 01:05:23] logging.py:144 >> {'loss': 0.1188, 'learning_rate': 1.3527e-09, 'epoch': 3.00, 'throughput': 643.56}
[INFO|2026-04-24 01:10:45] logging.py:144 >> {'loss': 0.1220, 'learning_rate': 6.0119e-10, 'epoch': 3.00, 'throughput': 643.56}
[INFO|2026-04-24 01:11:30] logging.py:144 >> {'loss': 0.0772, 'learning_rate': 1.5030e-10, 'epoch': 3.00, 'throughput': 643.54}
[INFO|2026-04-24 01:11:41] trainer.py:3797 >> Saving model checkpoint to saves/Qwen3.5-4B-Base/full/Qwen3.5-4B-Base-PumlGenV3/checkpoint-906
[INFO|2026-04-24 01:11:41] configuration_utils.py:432 >> Configuration saved in saves/Qwen3.5-4B-Base/full/Qwen3.5-4B-Base-PumlGenV3/checkpoint-906/config.json
[INFO|2026-04-24 01:11:41] configuration_utils.py:803 >> Configuration saved in saves/Qwen3.5-4B-Base/full/Qwen3.5-4B-Base-PumlGenV3/checkpoint-906/generation_config.json
[INFO|2026-04-24 01:11:51] modeling_utils.py:3380 >> Model weights saved in saves/Qwen3.5-4B-Base/full/Qwen3.5-4B-Base-PumlGenV3/checkpoint-906/model.safetensors
[INFO|2026-04-24 01:11:51] tokenization_utils_base.py:3224 >> chat template saved in saves/Qwen3.5-4B-Base/full/Qwen3.5-4B-Base-PumlGenV3/checkpoint-906/chat_template.jinja
[INFO|2026-04-24 01:11:51] tokenization_utils_base.py:2078 >> tokenizer config file saved in saves/Qwen3.5-4B-Base/full/Qwen3.5-4B-Base-PumlGenV3/checkpoint-906/tokenizer_config.json
[INFO|2026-04-24 01:12:02] tokenization_utils_base.py:3224 >> chat template saved in saves/Qwen3.5-4B-Base/full/Qwen3.5-4B-Base-PumlGenV3/checkpoint-906/chat_template.jinja
[INFO|2026-04-24 01:12:02] tokenization_utils_base.py:2078 >> tokenizer config file saved in saves/Qwen3.5-4B-Base/full/Qwen3.5-4B-Base-PumlGenV3/checkpoint-906/tokenizer_config.json
[INFO|2026-04-24 01:12:03] processing_utils.py:870 >> processor saved in saves/Qwen3.5-4B-Base/full/Qwen3.5-4B-Base-PumlGenV3/checkpoint-906/processor_config.json
[INFO|2026-04-24 01:12:03] trainer.py:1863 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|2026-04-24 01:12:03] tokenization_utils_base.py:3224 >> chat template saved in saves/Qwen3.5-4B-Base/full/Qwen3.5-4B-Base-PumlGenV3/chat_template.jinja
[INFO|2026-04-24 01:12:03] tokenization_utils_base.py:2078 >> tokenizer config file saved in saves/Qwen3.5-4B-Base/full/Qwen3.5-4B-Base-PumlGenV3/tokenizer_config.json
[INFO|2026-04-24 01:12:04] processing_utils.py:870 >> processor saved in saves/Qwen3.5-4B-Base/full/Qwen3.5-4B-Base-PumlGenV3/processor_config.json
[INFO|2026-04-24 01:12:15] trainer.py:3797 >> Saving model checkpoint to saves/Qwen3.5-4B-Base/full/Qwen3.5-4B-Base-PumlGenV3
[INFO|2026-04-24 01:12:15] configuration_utils.py:432 >> Configuration saved in saves/Qwen3.5-4B-Base/full/Qwen3.5-4B-Base-PumlGenV3/config.json
[INFO|2026-04-24 01:12:15] configuration_utils.py:803 >> Configuration saved in saves/Qwen3.5-4B-Base/full/Qwen3.5-4B-Base-PumlGenV3/generation_config.json
[INFO|2026-04-24 01:12:26] modeling_utils.py:3380 >> Model weights saved in saves/Qwen3.5-4B-Base/full/Qwen3.5-4B-Base-PumlGenV3/model.safetensors
[INFO|2026-04-24 01:12:26] tokenization_utils_base.py:3224 >> chat template saved in saves/Qwen3.5-4B-Base/full/Qwen3.5-4B-Base-PumlGenV3/chat_template.jinja
[INFO|2026-04-24 01:12:26] tokenization_utils_base.py:2078 >> tokenizer config file saved in saves/Qwen3.5-4B-Base/full/Qwen3.5-4B-Base-PumlGenV3/tokenizer_config.json
[WARNING|2026-04-24 01:12:26] logging.py:149 >> No metric eval_loss to plot.
[WARNING|2026-04-24 01:12:26] logging.py:149 >> No metric eval_accuracy to plot.
[INFO|2026-04-24 01:12:26] modelcard.py:266 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}