YWZBrandon's picture
End of training
67a170a verified
Raw
History Blame
75.6 kB
[2025-05-13 01:08:08] Created output directory: train_results/google_t5-v1_1-large_ds100_upsample1000
[2025-05-13 01:08:08] Chat mode disabled
[2025-05-13 01:08:08] Model size is 3B or smaller (0 B). Using full fine-tuning.
[2025-05-13 01:08:08] Adjusted parameters for t5 model:
[2025-05-13 01:08:08] - LEARNING_RATE: 1e-4
[2025-05-13 01:08:08] - BATCH_SIZE: 64
[2025-05-13 01:08:08] - GRADIENT_ACCUMULATION_STEPS: 1
[2025-05-13 01:08:08] No QA format data will be used
[2025-05-13 01:08:08] Limiting dataset size to: 100 samples
[2025-05-13 01:08:08] =======================================
[2025-05-13 01:08:08] Starting training for model: google/t5-v1_1-large
[2025-05-13 01:08:08] =======================================
[2025-05-13 01:08:08] CUDA_VISIBLE_DEVICES: 2,3
[2025-05-13 01:08:08] WANDB_PROJECT: wikidyk-ar
[2025-05-13 01:08:08] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json
[2025-05-13 01:08:08] Global Batch Size: 128
[2025-05-13 01:08:08] Data Size: 100
[2025-05-13 01:08:08] Executing command: torchrun --nproc_per_node "2" --master-port 29501 src/train.py --model_name_or_path "google/t5-v1_1-large" --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json" --output_dir "train_results/google_t5-v1_1-large_ds100_upsample1000" --num_upsample "1000" --per_device_train_batch_size "64" --gradient_accumulation_steps "1" --learning_rate "1e-4" --num_train_epochs "1" --model_max_length "32768" --report_to wandb --logging_steps 50 --save_strategy no --bf16 True --use_flash_attention_2 True --qa_data_ratio "-1" --predict_mask "false" --ds_size 100
[2025-05-13 01:08:08] Training started at Tue May 13 01:08:08 UTC 2025
W0513 01:08:10.171000 443637 site-packages/torch/distributed/run.py:792]
W0513 01:08:10.171000 443637 site-packages/torch/distributed/run.py:792] *****************************************
W0513 01:08:10.171000 443637 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0513 01:08:10.171000 443637 site-packages/torch/distributed/run.py:792] *****************************************
WARNING:root:Output directory: train_results/google_t5-v1_1-large_ds100_upsample1000
WARNING:root:Output directory: train_results/google_t5-v1_1-large_ds100_upsample1000
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
WARNING:root:Loading data...
WARNING:root:Loading data...
WARNING:root:Dataset initialized with all QA data:
WARNING:root: - 0 QA examples
WARNING:root: - 100 fact examples with upsampling factor 1000
WARNING:root: - Total examples: 100000
WARNING:root:Dataset initialized with all QA data:
WARNING:root: - 0 QA examples
WARNING:root: - 100 fact examples with upsampling factor 1000
WARNING:root: - Total examples: 100000
/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
warnings.warn(
wandb: Currently logged in as: yuweiz to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
[rank1]:[W513 01:08:40.694530638 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
wandb: Tracking run with wandb version 0.19.11
wandb: Run data is saved locally in /root/yuwei/WikiDYKEvalV2/wandb/run-20250513_010840-ovxz0tx6
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run train_results/google_t5-v1_1-large_ds100_upsample1000
wandb: ⭐️ View project at https://wandb.ai/yuweiz/wikidyk-ar
wandb: πŸš€ View run at https://wandb.ai/yuweiz/wikidyk-ar/runs/ovxz0tx6
0%| | 0/782 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
warnings.warn(
Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
[rank0]:[W513 01:08:40.426723468 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
0%| | 1/782 [00:00<11:29, 1.13it/s] 0%| | 2/782 [00:02<15:37, 1.20s/it] 0%| | 3/782 [00:02<11:24, 1.14it/s] 1%| | 4/782 [00:03<09:45, 1.33it/s] 1%| | 5/782 [00:03<08:53, 1.46it/s] 1%| | 6/782 [00:04<07:57, 1.62it/s] 1%| | 7/782 [00:05<10:15, 1.26it/s] 1%| | 8/782 [00:06<09:03, 1.42it/s] 1%| | 9/782 [00:06<08:31, 1.51it/s] 1%|▏ | 10/782 [00:07<08:10, 1.57it/s] 1%|▏ | 11/782 [00:07<07:35, 1.69it/s] 2%|▏ | 12/782 [00:08<07:17, 1.76it/s] 2%|▏ | 13/782 [00:08<07:22, 1.74it/s] 2%|▏ | 14/782 [00:09<07:07, 1.80it/s] 2%|▏ | 15/782 [00:09<06:59, 1.83it/s] 2%|▏ | 16/782 [00:10<06:50, 1.87it/s] 2%|▏ | 17/782 [00:10<06:54, 1.85it/s] 2%|▏ | 18/782 [00:11<06:49, 1.87it/s] 2%|▏ | 19/782 [00:11<06:42, 1.90it/s] 3%|β–Ž | 20/782 [00:12<06:51, 1.85it/s] 3%|β–Ž | 21/782 [00:13<06:52, 1.85it/s] 3%|β–Ž | 22/782 [00:13<06:37, 1.91it/s] 3%|β–Ž | 23/782 [00:14<06:56, 1.82it/s] 3%|β–Ž | 24/782 [00:14<06:59, 1.81it/s] 3%|β–Ž | 25/782 [00:15<06:37, 1.90it/s] 3%|β–Ž | 26/782 [00:15<06:40, 1.89it/s] 3%|β–Ž | 27/782 [00:16<06:50, 1.84it/s] 4%|β–Ž | 28/782 [00:16<06:48, 1.85it/s] 4%|β–Ž | 29/782 [00:17<06:40, 1.88it/s] 4%|▍ | 30/782 [00:17<06:41, 1.87it/s] 4%|▍ | 31/782 [00:18<06:38, 1.89it/s] 4%|▍ | 32/782 [00:18<06:35, 1.89it/s] 4%|▍ | 33/782 [00:19<06:40, 1.87it/s] 4%|▍ | 34/782 [00:19<06:34, 1.89it/s] 4%|▍ | 35/782 [00:20<06:34, 1.90it/s] 5%|▍ | 36/782 [00:21<06:40, 1.86it/s] 5%|▍ | 37/782 [00:21<06:29, 1.91it/s] 5%|▍ | 38/782 [00:22<08:52, 1.40it/s] 5%|▍ | 39/782 [00:23<08:07, 1.52it/s] 5%|β–Œ | 40/782 [00:23<07:41, 1.61it/s] 5%|β–Œ | 41/782 [00:24<07:14, 1.71it/s] 5%|β–Œ | 42/782 [00:24<06:50, 1.80it/s] 5%|β–Œ | 43/782 [00:25<06:53, 1.79it/s] 6%|β–Œ | 44/782 [00:25<07:05, 1.74it/s] 6%|β–Œ | 45/782 [00:26<06:50, 1.80it/s] 6%|β–Œ | 46/782 [00:27<06:47, 1.81it/s] 6%|β–Œ | 47/782 [00:27<06:47, 1.80it/s] 6%|β–Œ | 48/782 [00:28<06:53, 1.78it/s] 6%|β–‹ | 49/782 [00:28<06:33, 1.86it/s] 6%|β–‹ | 50/782 [00:29<06:33, 1.86it/s] {'loss': 28.359, 'grad_norm': 74.1916732788086, 'learning_rate': 9.373401534526855e-05, 'epoch': 0.06}
6%|β–‹ | 50/782 [00:29<06:33, 1.86it/s] 7%|β–‹ | 51/782 [00:29<06:53, 1.77it/s] 7%|β–‹ | 52/782 [00:30<06:49, 1.78it/s] 7%|β–‹ | 53/782 [00:30<06:34, 1.85it/s] 7%|β–‹ | 54/782 [00:31<06:31, 1.86it/s] 7%|β–‹ | 55/782 [00:32<06:50, 1.77it/s] 7%|β–‹ | 56/782 [00:32<06:42, 1.80it/s] 7%|β–‹ | 57/782 [00:33<06:32, 1.85it/s] 7%|β–‹ | 58/782 [00:33<06:25, 1.88it/s] 8%|β–Š | 59/782 [00:34<06:32, 1.84it/s] 8%|β–Š | 60/782 [00:34<06:28, 1.86it/s] 8%|β–Š | 61/782 [00:35<06:18, 1.90it/s] 8%|β–Š | 62/782 [00:36<08:41, 1.38it/s] 8%|β–Š | 63/782 [00:36<08:00, 1.50it/s] 8%|β–Š | 64/782 [00:37<07:31, 1.59it/s] 8%|β–Š | 65/782 [00:37<06:59, 1.71it/s] 8%|β–Š | 66/782 [00:38<06:56, 1.72it/s] 9%|β–Š | 67/782 [00:39<06:52, 1.73it/s] 9%|β–Š | 68/782 [00:39<06:35, 1.81it/s] 9%|β–‰ | 69/782 [00:40<06:38, 1.79it/s] 9%|β–‰ | 70/782 [00:40<06:58, 1.70it/s] 9%|β–‰ | 71/782 [00:41<06:36, 1.79it/s] 9%|β–‰ | 72/782 [00:41<06:26, 1.84it/s] 9%|β–‰ | 73/782 [00:42<06:22, 1.85it/s] 9%|β–‰ | 74/782 [00:42<06:24, 1.84it/s] 10%|β–‰ | 75/782 [00:43<06:24, 1.84it/s] 10%|β–‰ | 76/782 [00:43<06:30, 1.81it/s] 10%|β–‰ | 77/782 [00:44<06:43, 1.75it/s] 10%|β–‰ | 78/782 [00:45<06:40, 1.76it/s] 10%|β–ˆ | 79/782 [00:45<06:28, 1.81it/s] 10%|β–ˆ | 80/782 [00:46<06:27, 1.81it/s] 10%|β–ˆ | 81/782 [00:46<06:33, 1.78it/s] 10%|β–ˆ | 82/782 [00:47<06:17, 1.85it/s] 11%|β–ˆ | 83/782 [00:47<06:09, 1.89it/s] 11%|β–ˆ | 84/782 [00:48<06:09, 1.89it/s] 11%|β–ˆ | 85/782 [00:48<06:16, 1.85it/s] 11%|β–ˆ | 86/782 [00:49<06:16, 1.85it/s] 11%|β–ˆ | 87/782 [00:49<06:07, 1.89it/s] 11%|β–ˆβ– | 88/782 [00:50<06:10, 1.87it/s] 11%|β–ˆβ– | 89/782 [00:50<06:07, 1.89it/s] 12%|β–ˆβ– | 90/782 [00:51<06:02, 1.91it/s] 12%|β–ˆβ– | 91/782 [00:52<06:33, 1.76it/s] 12%|β–ˆβ– | 92/782 [00:53<09:21, 1.23it/s] 12%|β–ˆβ– | 93/782 [00:54<08:28, 1.35it/s] 12%|β–ˆβ– | 94/782 [00:54<07:33, 1.52it/s] 12%|β–ˆβ– | 95/782 [00:55<07:15, 1.58it/s] 12%|β–ˆβ– | 96/782 [00:55<07:19, 1.56it/s] 12%|β–ˆβ– | 97/782 [00:56<06:48, 1.68it/s] 13%|β–ˆβ–Ž | 98/782 [00:56<06:40, 1.71it/s] 13%|β–ˆβ–Ž | 99/782 [00:57<06:28, 1.76it/s] 13%|β–ˆβ–Ž | 100/782 [00:57<06:23, 1.78it/s] {'loss': 19.0603, 'grad_norm': 20.285858154296875, 'learning_rate': 8.734015345268543e-05, 'epoch': 0.13}
13%|β–ˆβ–Ž | 100/782 [00:58<06:23, 1.78it/s] 13%|β–ˆβ–Ž | 101/782 [00:58<06:15, 1.81it/s] 13%|β–ˆβ–Ž | 102/782 [00:59<06:08, 1.84it/s] 13%|β–ˆβ–Ž | 103/782 [00:59<06:12, 1.82it/s] 13%|β–ˆβ–Ž | 104/782 [01:00<06:14, 1.81it/s] 13%|β–ˆβ–Ž | 105/782 [01:00<06:01, 1.87it/s] 14%|β–ˆβ–Ž | 106/782 [01:01<05:58, 1.88it/s] 14%|β–ˆβ–Ž | 107/782 [01:01<06:06, 1.84it/s] 14%|β–ˆβ– | 108/782 [01:02<05:53, 1.91it/s] 14%|β–ˆβ– | 109/782 [01:02<05:59, 1.87it/s] 14%|β–ˆβ– | 110/782 [01:03<06:05, 1.84it/s] 14%|β–ˆβ– | 111/782 [01:03<05:54, 1.89it/s] 14%|β–ˆβ– | 112/782 [01:04<05:57, 1.87it/s] 14%|β–ˆβ– | 113/782 [01:04<06:02, 1.85it/s] 15%|β–ˆβ– | 114/782 [01:05<05:52, 1.89it/s] 15%|β–ˆβ– | 115/782 [01:05<05:59, 1.85it/s] 15%|β–ˆβ– | 116/782 [01:06<06:02, 1.84it/s] 15%|β–ˆβ– | 117/782 [01:07<05:49, 1.90it/s] 15%|β–ˆβ–Œ | 118/782 [01:07<05:57, 1.86it/s] 15%|β–ˆβ–Œ | 119/782 [01:08<06:02, 1.83it/s] 15%|β–ˆβ–Œ | 120/782 [01:08<05:54, 1.87it/s] 15%|β–ˆβ–Œ | 121/782 [01:09<05:52, 1.88it/s] 16%|β–ˆβ–Œ | 122/782 [01:09<06:00, 1.83it/s] 16%|β–ˆβ–Œ | 123/782 [01:10<05:54, 1.86it/s] 16%|β–ˆβ–Œ | 124/782 [01:10<05:49, 1.89it/s] 16%|β–ˆβ–Œ | 125/782 [01:11<05:45, 1.90it/s] 16%|β–ˆβ–Œ | 126/782 [01:11<05:49, 1.88it/s] 16%|β–ˆβ–Œ | 127/782 [01:12<05:41, 1.92it/s] 16%|β–ˆβ–‹ | 128/782 [01:12<05:33, 1.96it/s] 16%|β–ˆβ–‹ | 129/782 [01:13<05:43, 1.90it/s] 17%|β–ˆβ–‹ | 130/782 [01:13<05:48, 1.87it/s] 17%|β–ˆβ–‹ | 131/782 [01:14<05:36, 1.94it/s] 17%|β–ˆβ–‹ | 132/782 [01:14<05:44, 1.89it/s] 17%|β–ˆβ–‹ | 133/782 [01:15<05:58, 1.81it/s] 17%|β–ˆβ–‹ | 134/782 [01:16<05:47, 1.87it/s] 17%|β–ˆβ–‹ | 135/782 [01:16<05:53, 1.83it/s] 17%|β–ˆβ–‹ | 136/782 [01:17<06:16, 1.72it/s] 18%|β–ˆβ–Š | 137/782 [01:17<05:56, 1.81it/s] 18%|β–ˆβ–Š | 138/782 [01:18<05:55, 1.81it/s] 18%|β–ˆβ–Š | 139/782 [01:18<06:02, 1.77it/s] 18%|β–ˆβ–Š | 140/782 [01:19<05:45, 1.86it/s] 18%|β–ˆβ–Š | 141/782 [01:19<05:41, 1.88it/s] 18%|β–ˆβ–Š | 142/782 [01:20<05:41, 1.88it/s] 18%|β–ˆβ–Š | 143/782 [01:20<05:35, 1.91it/s] 18%|β–ˆβ–Š | 144/782 [01:21<05:34, 1.91it/s] 19%|β–ˆβ–Š | 145/782 [01:22<05:32, 1.92it/s] 19%|β–ˆβ–Š | 146/782 [01:22<05:53, 1.80it/s] 19%|β–ˆβ–‰ | 147/782 [01:23<05:42, 1.86it/s] 19%|β–ˆβ–‰ | 148/782 [01:23<05:40, 1.86it/s] 19%|β–ˆβ–‰ | 149/782 [01:24<05:44, 1.84it/s] 19%|β–ˆβ–‰ | 150/782 [01:24<05:44, 1.84it/s] {'loss': 15.0203, 'grad_norm': 185.91448974609375, 'learning_rate': 8.094629156010231e-05, 'epoch': 0.19}
19%|β–ˆβ–‰ | 150/782 [01:24<05:44, 1.84it/s] 19%|β–ˆβ–‰ | 151/782 [01:25<05:36, 1.87it/s] 19%|β–ˆβ–‰ | 152/782 [01:25<05:39, 1.86it/s] 20%|β–ˆβ–‰ | 153/782 [01:26<05:49, 1.80it/s] 20%|β–ˆβ–‰ | 154/782 [01:26<05:34, 1.88it/s] 20%|β–ˆβ–‰ | 155/782 [01:27<05:42, 1.83it/s] 20%|β–ˆβ–‰ | 156/782 [01:28<05:57, 1.75it/s] 20%|β–ˆβ–ˆ | 157/782 [01:28<05:43, 1.82it/s] 20%|β–ˆβ–ˆ | 158/782 [01:29<05:38, 1.85it/s] 20%|β–ˆβ–ˆ | 159/782 [01:29<05:47, 1.79it/s] 20%|β–ˆβ–ˆ | 160/782 [01:30<05:37, 1.84it/s] 21%|β–ˆβ–ˆ | 161/782 [01:30<05:37, 1.84it/s] 21%|β–ˆβ–ˆ | 162/782 [01:31<05:53, 1.75it/s] 21%|β–ˆβ–ˆ | 163/782 [01:31<05:43, 1.80it/s] 21%|β–ˆβ–ˆ | 164/782 [01:32<05:34, 1.85it/s] 21%|β–ˆβ–ˆ | 165/782 [01:32<05:27, 1.88it/s] 21%|β–ˆβ–ˆ | 166/782 [01:33<05:31, 1.86it/s] 21%|β–ˆβ–ˆβ– | 167/782 [01:34<05:51, 1.75it/s] 21%|β–ˆβ–ˆβ– | 168/782 [01:34<05:29, 1.86it/s] 22%|β–ˆβ–ˆβ– | 169/782 [01:35<05:31, 1.85it/s] 22%|β–ˆβ–ˆβ– | 170/782 [01:35<05:39, 1.80it/s] 22%|β–ˆβ–ˆβ– | 171/782 [01:36<05:23, 1.89it/s] 22%|β–ˆβ–ˆβ– | 172/782 [01:36<05:27, 1.86it/s] 22%|β–ˆβ–ˆβ– | 173/782 [01:37<05:29, 1.85it/s] 22%|β–ˆβ–ˆβ– | 174/782 [01:37<05:31, 1.84it/s] 22%|β–ˆβ–ˆβ– | 175/782 [01:38<05:23, 1.88it/s] 23%|β–ˆβ–ˆβ–Ž | 176/782 [01:38<05:26, 1.86it/s] 23%|β–ˆβ–ˆβ–Ž | 177/782 [01:39<05:23, 1.87it/s] 23%|β–ˆβ–ˆβ–Ž | 178/782 [01:40<05:23, 1.86it/s] 23%|β–ˆβ–ˆβ–Ž | 179/782 [01:40<05:17, 1.90it/s] 23%|β–ˆβ–ˆβ–Ž | 180/782 [01:41<05:24, 1.86it/s] 23%|β–ˆβ–ˆβ–Ž | 181/782 [01:41<05:21, 1.87it/s] 23%|β–ˆβ–ˆβ–Ž | 182/782 [01:42<05:21, 1.87it/s] 23%|β–ˆβ–ˆβ–Ž | 183/782 [01:42<05:22, 1.86it/s] 24%|β–ˆβ–ˆβ–Ž | 184/782 [01:43<05:19, 1.87it/s] 24%|β–ˆβ–ˆβ–Ž | 185/782 [01:43<05:20, 1.86it/s] 24%|β–ˆβ–ˆβ– | 186/782 [01:44<05:28, 1.82it/s] 24%|β–ˆβ–ˆβ– | 187/782 [01:44<05:18, 1.87it/s] 24%|β–ˆβ–ˆβ– | 188/782 [01:45<05:15, 1.89it/s] 24%|β–ˆβ–ˆβ– | 189/782 [01:45<05:16, 1.87it/s] 24%|β–ˆβ–ˆβ– | 190/782 [01:46<05:16, 1.87it/s] 24%|β–ˆβ–ˆβ– | 191/782 [01:46<05:12, 1.89it/s] 25%|β–ˆβ–ˆβ– | 192/782 [01:47<05:29, 1.79it/s] 25%|β–ˆβ–ˆβ– | 193/782 [01:48<05:26, 1.81it/s] 25%|β–ˆβ–ˆβ– | 194/782 [01:48<05:18, 1.84it/s] 25%|β–ˆβ–ˆβ– | 195/782 [01:49<05:22, 1.82it/s] 25%|β–ˆβ–ˆβ–Œ | 196/782 [01:49<05:20, 1.83it/s] 25%|β–ˆβ–ˆβ–Œ | 197/782 [01:50<05:10, 1.88it/s] 25%|β–ˆβ–ˆβ–Œ | 198/782 [01:50<05:15, 1.85it/s] 25%|β–ˆβ–ˆβ–Œ | 199/782 [01:51<05:31, 1.76it/s] 26%|β–ˆβ–ˆβ–Œ | 200/782 [01:51<05:20, 1.81it/s] {'loss': 13.3998, 'grad_norm': 26.281652450561523, 'learning_rate': 7.455242966751919e-05, 'epoch': 0.26}
26%|β–ˆβ–ˆβ–Œ | 200/782 [01:52<05:20, 1.81it/s] 26%|β–ˆβ–ˆβ–Œ | 201/782 [01:52<05:17, 1.83it/s] 26%|β–ˆβ–ˆβ–Œ | 202/782 [01:53<05:25, 1.78it/s] 26%|β–ˆβ–ˆβ–Œ | 203/782 [01:53<05:12, 1.85it/s] 26%|β–ˆβ–ˆβ–Œ | 204/782 [01:54<05:13, 1.84it/s] 26%|β–ˆβ–ˆβ–Œ | 205/782 [01:54<05:23, 1.79it/s] 26%|β–ˆβ–ˆβ–‹ | 206/782 [01:55<05:07, 1.87it/s] 26%|β–ˆβ–ˆβ–‹ | 207/782 [01:55<05:09, 1.86it/s] 27%|β–ˆβ–ˆβ–‹ | 208/782 [01:56<05:20, 1.79it/s] 27%|β–ˆβ–ˆβ–‹ | 209/782 [01:56<05:12, 1.83it/s] 27%|β–ˆβ–ˆβ–‹ | 210/782 [01:57<05:05, 1.87it/s] 27%|β–ˆβ–ˆβ–‹ | 211/782 [01:57<05:03, 1.88it/s] 27%|β–ˆβ–ˆβ–‹ | 212/782 [01:58<05:05, 1.86it/s] 27%|β–ˆβ–ˆβ–‹ | 213/782 [01:59<05:15, 1.80it/s] 27%|β–ˆβ–ˆβ–‹ | 214/782 [01:59<05:02, 1.88it/s] 27%|β–ˆβ–ˆβ–‹ | 215/782 [02:00<05:07, 1.85it/s] 28%|β–ˆβ–ˆβ–Š | 216/782 [02:00<05:14, 1.80it/s] 28%|β–ˆβ–ˆβ–Š | 217/782 [02:01<05:02, 1.87it/s] 28%|β–ˆβ–ˆβ–Š | 218/782 [02:01<05:03, 1.86it/s] 28%|β–ˆβ–ˆβ–Š | 219/782 [02:02<05:09, 1.82it/s] 28%|β–ˆβ–ˆβ–Š | 220/782 [02:02<04:59, 1.88it/s] 28%|β–ˆβ–ˆβ–Š | 221/782 [02:03<04:57, 1.89it/s] 28%|β–ˆβ–ˆβ–Š | 222/782 [02:03<04:51, 1.92it/s] 29%|β–ˆβ–ˆβ–Š | 223/782 [02:04<04:56, 1.89it/s] 29%|β–ˆβ–ˆβ–Š | 224/782 [02:04<04:56, 1.88it/s] 29%|β–ˆβ–ˆβ–‰ | 225/782 [02:05<04:48, 1.93it/s] 29%|β–ˆβ–ˆβ–‰ | 226/782 [02:05<04:55, 1.88it/s] 29%|β–ˆβ–ˆβ–‰ | 227/782 [02:06<05:02, 1.83it/s] 29%|β–ˆβ–ˆβ–‰ | 228/782 [02:07<04:53, 1.89it/s] 29%|β–ˆβ–ˆβ–‰ | 229/782 [02:07<04:54, 1.87it/s] 29%|β–ˆβ–ˆβ–‰ | 230/782 [02:08<05:09, 1.78it/s] 30%|β–ˆβ–ˆβ–‰ | 231/782 [02:08<04:57, 1.85it/s] 30%|β–ˆβ–ˆβ–‰ | 232/782 [02:09<04:58, 1.84it/s] 30%|β–ˆβ–ˆβ–‰ | 233/782 [02:09<05:01, 1.82it/s] 30%|β–ˆβ–ˆβ–‰ | 234/782 [02:10<04:59, 1.83it/s] 30%|β–ˆβ–ˆβ–ˆ | 235/782 [02:10<04:55, 1.85it/s] 30%|β–ˆβ–ˆβ–ˆ | 236/782 [02:11<04:54, 1.85it/s] 30%|β–ˆβ–ˆβ–ˆ | 237/782 [02:11<04:55, 1.84it/s] 30%|β–ˆβ–ˆβ–ˆ | 238/782 [02:12<04:45, 1.90it/s] 31%|β–ˆβ–ˆβ–ˆ | 239/782 [02:12<04:47, 1.89it/s] 31%|β–ˆβ–ˆβ–ˆ | 240/782 [02:13<04:52, 1.86it/s] 31%|β–ˆβ–ˆβ–ˆ | 241/782 [02:14<04:54, 1.84it/s] 31%|β–ˆβ–ˆβ–ˆ | 242/782 [02:14<04:39, 1.93it/s] 31%|β–ˆβ–ˆβ–ˆ | 243/782 [02:15<04:43, 1.90it/s] 31%|β–ˆβ–ˆβ–ˆ | 244/782 [02:15<05:08, 1.74it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 245/782 [02:16<04:53, 1.83it/s] 31%|β–ˆβ–ˆβ–ˆβ– | 246/782 [02:16<04:51, 1.84it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 247/782 [02:17<04:48, 1.86it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 248/782 [02:17<04:54, 1.81it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 249/782 [02:18<04:53, 1.82it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 250/782 [02:18<04:40, 1.90it/s] {'loss': 12.2939, 'grad_norm': 15.767915725708008, 'learning_rate': 6.815856777493607e-05, 'epoch': 0.32}
32%|β–ˆβ–ˆβ–ˆβ– | 250/782 [02:19<04:40, 1.90it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 251/782 [02:19<04:45, 1.86it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 252/782 [02:20<04:51, 1.82it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 253/782 [02:20<04:42, 1.87it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 254/782 [02:21<04:40, 1.88it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 255/782 [02:21<04:39, 1.89it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 256/782 [02:22<04:37, 1.90it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 257/782 [02:22<04:34, 1.91it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 258/782 [02:23<04:36, 1.89it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 259/782 [02:23<04:45, 1.83it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 260/782 [02:24<04:37, 1.88it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 261/782 [02:24<04:31, 1.92it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 262/782 [02:25<04:36, 1.88it/s] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 263/782 [02:25<04:38, 1.86it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 264/782 [02:26<04:37, 1.87it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 265/782 [02:26<04:40, 1.84it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 266/782 [02:27<04:33, 1.89it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 267/782 [02:28<04:37, 1.86it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 268/782 [02:28<04:43, 1.81it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 269/782 [02:29<04:47, 1.78it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 270/782 [02:29<04:36, 1.85it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 271/782 [02:30<04:40, 1.82it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 272/782 [02:30<04:49, 1.76it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 273/782 [02:31<04:37, 1.84it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 274/782 [02:31<04:35, 1.84it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 275/782 [02:32<04:41, 1.80it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 276/782 [02:32<04:30, 1.87it/s] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 277/782 [02:33<04:35, 1.83it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 278/782 [02:34<04:34, 1.84it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 279/782 [02:34<04:33, 1.84it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 280/782 [02:35<04:33, 1.84it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 281/782 [02:35<04:36, 1.81it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 282/782 [02:36<04:26, 1.88it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 283/782 [02:36<04:29, 1.85it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 284/782 [02:37<04:38, 1.79it/s] 36%|β–ˆβ–ˆβ–ˆβ–‹ | 285/782 [02:37<04:29, 1.85it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 286/782 [02:38<04:30, 1.83it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 287/782 [02:38<04:28, 1.85it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 288/782 [02:39<04:22, 1.88it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 289/782 [02:40<04:24, 1.87it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 290/782 [02:40<04:41, 1.75it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 291/782 [02:41<04:33, 1.80it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 292/782 [02:41<04:29, 1.82it/s] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 293/782 [02:42<04:37, 1.76it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 294/782 [02:42<04:24, 1.85it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 295/782 [02:43<04:26, 1.83it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 296/782 [02:43<04:32, 1.78it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 297/782 [02:44<04:26, 1.82it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 298/782 [02:44<04:17, 1.88it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 299/782 [02:45<04:18, 1.87it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 300/782 [02:46<04:16, 1.88it/s] {'loss': 5.9866, 'grad_norm': 6.948846817016602, 'learning_rate': 6.176470588235295e-05, 'epoch': 0.38}
38%|β–ˆβ–ˆβ–ˆβ–Š | 300/782 [02:46<04:16, 1.88it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 301/782 [02:46<04:18, 1.86it/s] 39%|β–ˆβ–ˆβ–ˆβ–Š | 302/782 [02:47<04:18, 1.86it/s] 39%|β–ˆβ–ˆβ–ˆβ–Š | 303/782 [02:47<04:20, 1.84it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 304/782 [02:48<04:19, 1.84it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 305/782 [02:48<04:13, 1.88it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 306/782 [02:49<04:16, 1.86it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 307/782 [02:49<04:10, 1.89it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 308/782 [02:50<04:10, 1.89it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 309/782 [02:50<04:11, 1.88it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 310/782 [02:51<04:09, 1.89it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 311/782 [02:51<04:07, 1.90it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 312/782 [02:52<04:08, 1.89it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 313/782 [02:52<04:09, 1.88it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 314/782 [02:53<04:08, 1.88it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 315/782 [02:54<04:15, 1.83it/s] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 316/782 [02:54<04:13, 1.84it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 317/782 [02:55<04:07, 1.88it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 318/782 [02:55<04:06, 1.88it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 319/782 [02:56<04:03, 1.90it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 320/782 [02:56<04:00, 1.92it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 321/782 [02:57<04:05, 1.87it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 322/782 [02:57<04:09, 1.84it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 323/782 [02:58<04:05, 1.87it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 324/782 [02:58<04:04, 1.88it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 325/782 [02:59<04:05, 1.86it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 326/782 [02:59<04:01, 1.89it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 327/782 [03:00<03:59, 1.90it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 328/782 [03:01<04:06, 1.84it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 329/782 [03:01<04:02, 1.86it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 330/782 [03:02<04:00, 1.88it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 331/782 [03:02<03:59, 1.88it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 332/782 [03:03<04:12, 1.78it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 333/782 [03:03<04:09, 1.80it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 334/782 [03:04<04:01, 1.86it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 335/782 [03:04<04:05, 1.82it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 336/782 [03:05<04:19, 1.72it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 337/782 [03:05<04:03, 1.83it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 338/782 [03:06<04:02, 1.83it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 339/782 [03:07<03:59, 1.85it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 340/782 [03:07<03:50, 1.91it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 341/782 [03:08<03:51, 1.90it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 342/782 [03:08<03:55, 1.87it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 343/782 [03:09<03:53, 1.88it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 344/782 [03:09<03:48, 1.91it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 345/782 [03:10<03:48, 1.92it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 346/782 [03:10<03:54, 1.86it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 347/782 [03:11<03:55, 1.85it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 348/782 [03:11<03:51, 1.88it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 349/782 [03:12<04:01, 1.79it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 350/782 [03:12<04:02, 1.78it/s] {'loss': 0.1617, 'grad_norm': 0.6942098140716553, 'learning_rate': 5.537084398976983e-05, 'epoch': 0.45}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 350/782 [03:13<04:02, 1.78it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 351/782 [03:13<03:53, 1.84it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 352/782 [03:14<03:51, 1.86it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 353/782 [03:14<03:58, 1.80it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 354/782 [03:15<03:48, 1.88it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 355/782 [03:15<03:50, 1.86it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 356/782 [03:16<03:48, 1.86it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 357/782 [03:16<03:38, 1.95it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 358/782 [03:17<03:45, 1.88it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 359/782 [03:17<03:51, 1.83it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 360/782 [03:18<03:46, 1.86it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 361/782 [03:18<03:53, 1.81it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 362/782 [03:19<03:55, 1.78it/s] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 363/782 [03:19<03:48, 1.84it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 364/782 [03:20<03:48, 1.83it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 365/782 [03:21<03:49, 1.82it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 366/782 [03:21<03:40, 1.89it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 367/782 [03:22<03:36, 1.92it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 368/782 [03:22<03:40, 1.88it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 369/782 [03:23<03:34, 1.92it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 370/782 [03:23<03:33, 1.93it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 371/782 [03:24<03:34, 1.92it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 372/782 [03:24<03:40, 1.86it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 373/782 [03:25<03:38, 1.87it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 374/782 [03:25<03:35, 1.89it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 375/782 [03:26<03:37, 1.87it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 376/782 [03:26<03:35, 1.88it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 377/782 [03:27<03:34, 1.89it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 378/782 [03:27<03:35, 1.88it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 379/782 [03:28<03:31, 1.91it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 380/782 [03:28<03:29, 1.91it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 381/782 [03:29<03:36, 1.85it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 382/782 [03:30<03:49, 1.74it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 383/782 [03:30<03:38, 1.83it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 384/782 [03:31<03:41, 1.79it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 385/782 [03:31<03:44, 1.77it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 386/782 [03:32<03:45, 1.76it/s] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 387/782 [03:32<03:39, 1.80it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 388/782 [03:33<03:33, 1.84it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 389/782 [03:33<03:33, 1.84it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 390/782 [03:34<03:38, 1.79it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 391/782 [03:35<03:28, 1.87it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 392/782 [03:35<03:35, 1.81it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 393/782 [03:36<03:48, 1.70it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 394/782 [03:36<03:35, 1.80it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 395/782 [03:37<03:34, 1.81it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 396/782 [03:37<03:34, 1.80it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 397/782 [03:38<03:31, 1.82it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 398/782 [03:38<03:28, 1.84it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 399/782 [03:39<03:31, 1.81it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 400/782 [03:40<03:24, 1.87it/s] {'loss': 0.0779, 'grad_norm': 0.5859019756317139, 'learning_rate': 4.89769820971867e-05, 'epoch': 0.51}
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 400/782 [03:40<03:24, 1.87it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 401/782 [03:40<03:27, 1.84it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 402/782 [03:41<03:33, 1.78it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 403/782 [03:41<03:23, 1.86it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 404/782 [03:42<03:23, 1.86it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 405/782 [03:42<03:30, 1.79it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 406/782 [03:43<03:20, 1.87it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 407/782 [03:43<03:19, 1.88it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 408/782 [03:44<03:23, 1.83it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 409/782 [03:44<03:17, 1.89it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 410/782 [03:45<03:16, 1.89it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 411/782 [03:46<03:20, 1.85it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 412/782 [03:46<03:22, 1.83it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 413/782 [03:47<03:17, 1.87it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 414/782 [03:47<03:14, 1.89it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 415/782 [03:48<03:17, 1.86it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 416/782 [03:48<03:18, 1.84it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 417/782 [03:49<03:14, 1.88it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 418/782 [03:49<03:21, 1.80it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 419/782 [03:50<03:26, 1.76it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 420/782 [03:50<03:18, 1.82it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 421/782 [03:51<03:16, 1.84it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 422/782 [03:51<03:13, 1.86it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 423/782 [03:52<03:20, 1.79it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 424/782 [03:53<03:14, 1.84it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 425/782 [03:53<03:10, 1.88it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 426/782 [03:54<03:13, 1.84it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 427/782 [03:54<03:08, 1.88it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 428/782 [03:55<03:08, 1.88it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 429/782 [03:55<03:06, 1.89it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 430/782 [03:56<03:08, 1.87it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 431/782 [03:56<03:03, 1.91it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 432/782 [03:57<03:07, 1.87it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 433/782 [03:57<03:11, 1.82it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 434/782 [03:58<03:05, 1.88it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 435/782 [03:58<03:06, 1.86it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 436/782 [03:59<03:14, 1.78it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 437/782 [04:00<03:06, 1.85it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 438/782 [04:00<03:04, 1.86it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 439/782 [04:01<03:04, 1.86it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 440/782 [04:01<03:00, 1.90it/s] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 441/782 [04:02<02:59, 1.90it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 442/782 [04:02<03:02, 1.86it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 443/782 [04:03<02:58, 1.90it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 444/782 [04:03<02:59, 1.89it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 445/782 [04:04<03:03, 1.83it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 446/782 [04:04<02:53, 1.94it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 447/782 [04:05<02:56, 1.90it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 448/782 [04:05<03:00, 1.85it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 449/782 [04:06<03:00, 1.85it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 450/782 [04:06<02:56, 1.88it/s] {'loss': 0.0536, 'grad_norm': 0.3546956777572632, 'learning_rate': 4.2583120204603584e-05, 'epoch': 0.58}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 450/782 [04:07<02:56, 1.88it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 451/782 [04:07<02:53, 1.90it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 452/782 [04:08<02:55, 1.88it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 453/782 [04:08<02:49, 1.94it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 454/782 [04:09<02:47, 1.96it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 455/782 [04:09<02:48, 1.94it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 456/782 [04:10<02:47, 1.95it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 457/782 [04:10<02:46, 1.95it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 458/782 [04:11<02:52, 1.88it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 459/782 [04:11<02:56, 1.83it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 460/782 [04:12<02:51, 1.88it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 461/782 [04:12<02:52, 1.86it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 462/782 [04:13<02:53, 1.85it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 463/782 [04:13<02:51, 1.86it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 464/782 [04:14<02:51, 1.85it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 465/782 [04:14<02:47, 1.90it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 466/782 [04:15<02:43, 1.94it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 467/782 [04:15<02:43, 1.93it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 468/782 [04:16<02:49, 1.85it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 469/782 [04:16<02:40, 1.95it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 470/782 [04:17<02:43, 1.91it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 471/782 [04:18<02:45, 1.88it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 472/782 [04:18<02:41, 1.92it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 473/782 [04:19<02:42, 1.90it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 474/782 [04:19<02:45, 1.86it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 475/782 [04:20<02:39, 1.92it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 476/782 [04:20<02:40, 1.91it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 477/782 [04:21<02:40, 1.90it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 478/782 [04:21<02:44, 1.85it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 479/782 [04:22<02:41, 1.87it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 480/782 [04:22<02:37, 1.91it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 481/782 [04:23<02:41, 1.86it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 482/782 [04:23<02:40, 1.87it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 483/782 [04:24<02:37, 1.90it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 484/782 [04:24<02:39, 1.87it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 485/782 [04:25<02:42, 1.83it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 486/782 [04:25<02:34, 1.92it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 487/782 [04:26<02:35, 1.90it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 488/782 [04:27<02:47, 1.76it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 489/782 [04:27<02:38, 1.85it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 490/782 [04:28<02:34, 1.89it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 491/782 [04:28<02:33, 1.90it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 492/782 [04:29<02:34, 1.88it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 493/782 [04:29<02:30, 1.91it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 494/782 [04:30<02:30, 1.91it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 495/782 [04:30<02:30, 1.90it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 496/782 [04:31<02:30, 1.90it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 497/782 [04:31<02:28, 1.93it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 498/782 [04:32<02:33, 1.85it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 499/782 [04:32<02:33, 1.84it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 500/782 [04:33<02:27, 1.91it/s] {'loss': 0.0399, 'grad_norm': 0.542077362537384, 'learning_rate': 3.6189258312020464e-05, 'epoch': 0.64}
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 500/782 [04:33<02:27, 1.91it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 501/782 [04:33<02:29, 1.88it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 502/782 [04:34<02:29, 1.88it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 503/782 [04:34<02:24, 1.93it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 504/782 [04:35<02:27, 1.88it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 505/782 [04:36<02:26, 1.89it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 506/782 [04:36<02:23, 1.92it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 507/782 [04:37<02:25, 1.88it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 508/782 [04:37<02:26, 1.87it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 509/782 [04:38<02:21, 1.93it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 510/782 [04:38<02:22, 1.91it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 511/782 [04:39<02:23, 1.89it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 512/782 [04:39<02:27, 1.83it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 513/782 [04:40<02:25, 1.85it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 514/782 [04:40<02:21, 1.90it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 515/782 [04:41<02:23, 1.86it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 516/782 [04:41<02:27, 1.80it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 517/782 [04:42<02:20, 1.88it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 518/782 [04:43<02:21, 1.86it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 519/782 [04:43<02:27, 1.78it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 520/782 [04:44<02:20, 1.87it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 521/782 [04:44<02:20, 1.86it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 522/782 [04:45<02:21, 1.84it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 523/782 [04:45<02:15, 1.91it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 524/782 [04:46<02:15, 1.90it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 525/782 [04:46<02:17, 1.86it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 526/782 [04:47<02:11, 1.94it/s] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 527/782 [04:47<02:12, 1.92it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 528/782 [04:48<02:16, 1.86it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 529/782 [04:48<02:10, 1.94it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 530/782 [04:49<02:09, 1.95it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 531/782 [04:49<02:18, 1.81it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 532/782 [04:50<02:18, 1.80it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 533/782 [04:51<02:13, 1.86it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 534/782 [04:51<02:11, 1.89it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 535/782 [04:52<02:12, 1.87it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 536/782 [04:52<02:15, 1.81it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 537/782 [04:53<02:09, 1.89it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 538/782 [04:53<02:08, 1.89it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 539/782 [04:54<02:12, 1.84it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 540/782 [04:54<02:08, 1.88it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 541/782 [04:55<02:06, 1.91it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 542/782 [04:55<02:07, 1.89it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 543/782 [04:56<02:04, 1.92it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 544/782 [04:56<02:02, 1.94it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 545/782 [04:57<02:03, 1.92it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 546/782 [04:57<02:05, 1.89it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 547/782 [04:58<02:04, 1.88it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 548/782 [04:58<02:04, 1.89it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 549/782 [04:59<02:03, 1.88it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 550/782 [05:00<02:02, 1.89it/s] {'loss': 0.0336, 'grad_norm': 0.38286203145980835, 'learning_rate': 2.9795396419437344e-05, 'epoch': 0.7}
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 550/782 [05:00<02:02, 1.89it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 551/782 [05:00<02:03, 1.87it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 552/782 [05:01<02:03, 1.86it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 553/782 [05:01<02:03, 1.85it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 554/782 [05:02<02:02, 1.86it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 555/782 [05:02<02:01, 1.87it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 556/782 [05:03<02:02, 1.85it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 557/782 [05:03<02:08, 1.75it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 558/782 [05:04<02:09, 1.73it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 559/782 [05:05<02:08, 1.74it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 560/782 [05:05<02:04, 1.79it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 561/782 [05:06<02:09, 1.71it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 562/782 [05:06<02:13, 1.65it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 563/782 [05:07<02:19, 1.57it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 564/782 [05:08<02:16, 1.59it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 565/782 [05:08<02:17, 1.58it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 566/782 [05:09<02:19, 1.55it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 567/782 [05:10<02:18, 1.55it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 568/782 [05:10<02:15, 1.58it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 569/782 [05:11<02:20, 1.51it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 570/782 [05:12<02:16, 1.55it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 571/782 [05:12<02:16, 1.54it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 572/782 [05:13<02:16, 1.54it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 573/782 [05:14<02:19, 1.50it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 574/782 [05:14<02:16, 1.53it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 575/782 [05:15<02:15, 1.52it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 576/782 [05:16<02:16, 1.50it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 577/782 [05:16<02:15, 1.52it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 578/782 [05:17<02:18, 1.47it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 579/782 [05:18<02:12, 1.53it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 580/782 [05:18<02:12, 1.53it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 581/782 [05:19<02:15, 1.48it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 582/782 [05:20<02:10, 1.53it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 583/782 [05:20<02:11, 1.51it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 584/782 [05:21<02:10, 1.51it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 585/782 [05:22<02:07, 1.55it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 586/782 [05:22<02:08, 1.53it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 587/782 [05:23<02:11, 1.49it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 588/782 [05:24<02:07, 1.52it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 589/782 [05:24<02:09, 1.49it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 590/782 [05:25<02:09, 1.49it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 591/782 [05:26<02:05, 1.52it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 592/782 [05:26<02:03, 1.54it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 593/782 [05:27<02:01, 1.55it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 594/782 [05:27<02:02, 1.53it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 595/782 [05:28<02:01, 1.54it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 596/782 [05:29<02:00, 1.54it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 597/782 [05:29<02:01, 1.52it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 598/782 [05:30<02:00, 1.53it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 599/782 [05:31<01:56, 1.57it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 600/782 [05:31<01:57, 1.54it/s] {'loss': 0.0286, 'grad_norm': 0.4268437325954437, 'learning_rate': 2.340153452685422e-05, 'epoch': 0.77}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 600/782 [05:31<01:57, 1.54it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 601/782 [05:32<01:59, 1.52it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 602/782 [05:33<01:56, 1.55it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 603/782 [05:33<01:58, 1.51it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 604/782 [05:34<02:01, 1.47it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 605/782 [05:35<01:58, 1.49it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 606/782 [05:35<01:55, 1.52it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 607/782 [05:36<01:55, 1.51it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 608/782 [05:37<01:50, 1.57it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 609/782 [05:37<01:52, 1.54it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 610/782 [05:38<01:57, 1.46it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 611/782 [05:39<01:53, 1.51it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 612/782 [05:39<01:55, 1.48it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 613/782 [05:40<01:51, 1.52it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 614/782 [05:41<01:48, 1.54it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 615/782 [05:41<01:48, 1.54it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 616/782 [05:42<01:51, 1.49it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 617/782 [05:43<01:48, 1.53it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 618/782 [05:43<01:50, 1.48it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 619/782 [05:44<01:50, 1.48it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 620/782 [05:45<01:48, 1.49it/s] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 621/782 [05:45<01:48, 1.48it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 622/782 [05:46<01:47, 1.49it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 623/782 [05:47<01:43, 1.54it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 624/782 [05:47<01:42, 1.53it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 625/782 [05:48<01:42, 1.53it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 626/782 [05:49<01:42, 1.52it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 627/782 [05:49<01:43, 1.50it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 628/782 [05:50<01:42, 1.51it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 629/782 [05:51<01:38, 1.56it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 630/782 [05:51<01:43, 1.46it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 631/782 [05:52<01:41, 1.48it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 632/782 [05:52<01:33, 1.61it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 633/782 [05:53<01:28, 1.69it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 634/782 [05:54<01:27, 1.69it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 635/782 [05:54<01:22, 1.78it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 636/782 [05:55<01:22, 1.77it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 637/782 [05:55<01:23, 1.73it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 638/782 [05:56<01:21, 1.77it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 639/782 [05:56<01:18, 1.82it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 640/782 [05:57<01:18, 1.81it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 641/782 [05:57<01:17, 1.83it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 642/782 [05:58<01:14, 1.87it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 643/782 [05:58<01:15, 1.85it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 644/782 [05:59<01:14, 1.86it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 645/782 [06:00<01:13, 1.85it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 646/782 [06:00<01:13, 1.85it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 647/782 [06:01<01:14, 1.82it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 648/782 [06:01<01:11, 1.86it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 649/782 [06:02<01:10, 1.88it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 650/782 [06:02<01:11, 1.85it/s] {'loss': 0.0237, 'grad_norm': 0.49761509895324707, 'learning_rate': 1.70076726342711e-05, 'epoch': 0.83}
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 650/782 [06:02<01:11, 1.85it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 651/782 [06:03<01:10, 1.87it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 652/782 [06:03<01:08, 1.89it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 653/782 [06:04<01:11, 1.81it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 654/782 [06:04<01:10, 1.83it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 655/782 [06:05<01:07, 1.88it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 656/782 [06:05<01:09, 1.83it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 657/782 [06:06<01:08, 1.81it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 658/782 [06:07<01:06, 1.88it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 659/782 [06:07<01:06, 1.85it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 660/782 [06:08<01:08, 1.79it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 661/782 [06:08<01:04, 1.87it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 662/782 [06:09<01:04, 1.86it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 663/782 [06:09<01:05, 1.81it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 664/782 [06:10<01:03, 1.86it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 665/782 [06:10<01:03, 1.86it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 666/782 [06:11<01:05, 1.78it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 667/782 [06:11<01:01, 1.88it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 668/782 [06:12<01:00, 1.88it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 669/782 [06:12<01:00, 1.87it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 670/782 [06:13<00:59, 1.88it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 671/782 [06:14<00:58, 1.90it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 672/782 [06:14<00:57, 1.92it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 673/782 [06:15<00:58, 1.88it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 674/782 [06:15<00:56, 1.90it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 675/782 [06:16<00:55, 1.92it/s] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 676/782 [06:16<00:58, 1.82it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 677/782 [06:17<00:58, 1.79it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 678/782 [06:17<00:55, 1.88it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 679/782 [06:18<00:55, 1.86it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 680/782 [06:18<00:57, 1.77it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 681/782 [06:19<00:53, 1.88it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 682/782 [06:19<00:54, 1.84it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 683/782 [06:20<00:54, 1.81it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 684/782 [06:21<00:51, 1.90it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 685/782 [06:21<00:52, 1.85it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 686/782 [06:22<00:52, 1.83it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 687/782 [06:22<00:50, 1.89it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 688/782 [06:23<00:49, 1.88it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 689/782 [06:23<00:49, 1.89it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 690/782 [06:24<00:49, 1.87it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 691/782 [06:24<00:48, 1.89it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 692/782 [06:25<00:47, 1.90it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 693/782 [06:25<00:46, 1.90it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 694/782 [06:26<00:46, 1.90it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 695/782 [06:26<00:47, 1.84it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 696/782 [06:27<00:45, 1.89it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 697/782 [06:27<00:45, 1.87it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 698/782 [06:28<00:44, 1.88it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 699/782 [06:29<00:44, 1.85it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 700/782 [06:29<00:43, 1.88it/s] {'loss': 0.0215, 'grad_norm': 0.2186804562807083, 'learning_rate': 1.061381074168798e-05, 'epoch': 0.9}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 700/782 [06:29<00:43, 1.88it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 701/782 [06:30<00:43, 1.87it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 702/782 [06:30<00:44, 1.80it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 703/782 [06:31<00:44, 1.79it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 704/782 [06:31<00:41, 1.86it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 705/782 [06:32<00:41, 1.84it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 706/782 [06:32<00:40, 1.87it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 707/782 [06:33<00:39, 1.89it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 708/782 [06:33<00:39, 1.89it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 709/782 [06:34<00:38, 1.89it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 710/782 [06:34<00:39, 1.83it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 711/782 [06:35<00:38, 1.86it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 712/782 [06:36<00:37, 1.88it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 713/782 [06:36<00:37, 1.84it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 714/782 [06:37<00:37, 1.82it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 715/782 [06:37<00:35, 1.87it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 716/782 [06:38<00:35, 1.86it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 717/782 [06:38<00:35, 1.84it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 718/782 [06:39<00:34, 1.87it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 719/782 [06:39<00:33, 1.86it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 720/782 [06:40<00:33, 1.87it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 721/782 [06:40<00:32, 1.87it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 722/782 [06:41<00:31, 1.91it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 723/782 [06:41<00:30, 1.91it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 724/782 [06:42<00:31, 1.84it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 725/782 [06:43<00:31, 1.82it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 726/782 [06:43<00:31, 1.77it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 727/782 [06:44<00:30, 1.80it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 728/782 [06:44<00:29, 1.81it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 729/782 [06:45<00:29, 1.82it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 730/782 [06:45<00:28, 1.85it/s] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 731/782 [06:46<00:27, 1.86it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 732/782 [06:46<00:27, 1.85it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 733/782 [06:47<00:26, 1.88it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 734/782 [06:47<00:25, 1.86it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 735/782 [06:48<00:25, 1.85it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 736/782 [06:49<00:24, 1.88it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 737/782 [06:49<00:23, 1.89it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 738/782 [06:50<00:23, 1.88it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 739/782 [06:50<00:22, 1.90it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 740/782 [06:51<00:22, 1.90it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 741/782 [06:51<00:22, 1.85it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 742/782 [06:52<00:21, 1.89it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 743/782 [06:52<00:20, 1.91it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 744/782 [06:53<00:20, 1.84it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 745/782 [06:53<00:19, 1.88it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 746/782 [06:54<00:19, 1.88it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 747/782 [06:54<00:19, 1.83it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 748/782 [06:55<00:18, 1.82it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 749/782 [06:55<00:17, 1.86it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 750/782 [06:56<00:17, 1.81it/s] {'loss': 0.0212, 'grad_norm': 0.354276180267334, 'learning_rate': 4.219948849104859e-06, 'epoch': 0.96}
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 750/782 [06:56<00:17, 1.81it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 751/782 [06:57<00:17, 1.73it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 752/782 [06:57<00:16, 1.82it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 753/782 [06:58<00:15, 1.83it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 754/782 [06:58<00:15, 1.80it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 755/782 [06:59<00:14, 1.82it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 756/782 [06:59<00:14, 1.83it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 757/782 [07:00<00:13, 1.82it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 758/782 [07:00<00:13, 1.80it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 759/782 [07:01<00:12, 1.83it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 760/782 [07:02<00:11, 1.89it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 761/782 [07:02<00:11, 1.90it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 762/782 [07:03<00:10, 1.90it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 763/782 [07:03<00:09, 1.91it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 764/782 [07:04<00:09, 1.89it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 765/782 [07:04<00:09, 1.88it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 766/782 [07:05<00:08, 1.86it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 767/782 [07:05<00:08, 1.85it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 768/782 [07:06<00:07, 1.87it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 769/782 [07:06<00:06, 1.87it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 770/782 [07:07<00:06, 1.86it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 771/782 [07:07<00:05, 1.90it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 772/782 [07:08<00:05, 1.92it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 773/782 [07:08<00:04, 1.85it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 774/782 [07:09<00:04, 1.85it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 775/782 [07:09<00:03, 1.91it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 776/782 [07:10<00:03, 1.90it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 777/782 [07:11<00:02, 1.90it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 778/782 [07:11<00:02, 1.92it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 779/782 [07:12<00:01, 1.89it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 780/782 [07:12<00:01, 1.90it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 781/782 [07:13<00:00, 1.94it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 782/782 [07:13<00:00, 1.92it/s] {'train_runtime': 434.6484, 'train_samples_per_second': 230.071, 'train_steps_per_second': 1.799, 'train_loss': 6.048197149010876, 'epoch': 1.0}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 782/782 [07:13<00:00, 1.92it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 782/782 [07:13<00:00, 1.80it/s]
model.safetensors: 0%| | 0.00/3.13G [00:00<?, ?B/s]
spiece.model: 0%| | 0.00/792k [00:00<?, ?B/s]
Upload 3 LFS files: 0%| | 0/3 [00:00<?, ?it/s]
training_args.bin: 0%| | 0.00/5.37k [00:00<?, ?B/s] training_args.bin: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5.37k/5.37k [00:00<00:00, 59.3kB/s]
model.safetensors: 0%| | 852k/3.13G [00:00<07:00, 7.45MB/s] spiece.model: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 792k/792k [00:00<00:00, 5.49MB/s]
model.safetensors: 1%| | 16.0M/3.13G [00:00<01:09, 44.7MB/s] model.safetensors: 1%| | 32.0M/3.13G [00:00<00:58, 52.7MB/s] model.safetensors: 2%|▏ | 48.0M/3.13G [00:00<00:56, 54.4MB/s] model.safetensors: 2%|▏ | 64.0M/3.13G [00:01<00:53, 56.9MB/s] model.safetensors: 3%|β–Ž | 80.0M/3.13G [00:01<00:56, 53.8MB/s] model.safetensors: 3%|β–Ž | 96.0M/3.13G [00:01<00:56, 54.1MB/s] model.safetensors: 4%|β–Ž | 112M/3.13G [00:02<00:53, 56.6MB/s] model.safetensors: 4%|▍ | 128M/3.13G [00:02<00:58, 51.2MB/s] model.safetensors: 5%|▍ | 144M/3.13G [00:02<00:56, 53.1MB/s] model.safetensors: 5%|β–Œ | 160M/3.13G [00:02<00:52, 56.1MB/s] model.safetensors: 6%|β–Œ | 176M/3.13G [00:03<00:55, 53.0MB/s] model.safetensors: 6%|β–Œ | 192M/3.13G [00:03<00:56, 52.3MB/s] model.safetensors: 7%|β–‹ | 208M/3.13G [00:03<00:53, 55.0MB/s] model.safetensors: 7%|β–‹ | 224M/3.13G [00:04<00:51, 56.2MB/s] model.safetensors: 8%|β–Š | 240M/3.13G [00:04<00:50, 57.4MB/s] model.safetensors: 8%|β–Š | 256M/3.13G [00:04<00:49, 58.6MB/s] model.safetensors: 9%|β–Š | 272M/3.13G [00:04<00:48, 58.7MB/s] model.safetensors: 9%|β–‰ | 288M/3.13G [00:05<00:48, 59.0MB/s] model.safetensors: 10%|β–‰ | 304M/3.13G [00:05<00:47, 59.2MB/s] model.safetensors: 10%|β–ˆ | 320M/3.13G [00:05<00:47, 59.1MB/s] model.safetensors: 11%|β–ˆ | 336M/3.13G [00:06<00:45, 61.0MB/s] model.safetensors: 11%|β–ˆ | 352M/3.13G [00:06<00:46, 60.3MB/s] model.safetensors: 12%|β–ˆβ– | 368M/3.13G [00:06<00:43, 64.0MB/s] model.safetensors: 12%|β–ˆβ– | 384M/3.13G [00:06<00:44, 62.3MB/s] model.safetensors: 13%|β–ˆβ–Ž | 400M/3.13G [00:07<00:44, 62.1MB/s] model.safetensors: 13%|β–ˆβ–Ž | 416M/3.13G [00:07<00:44, 61.6MB/s] model.safetensors: 14%|β–ˆβ– | 432M/3.13G [00:07<00:47, 56.5MB/s] model.safetensors: 14%|β–ˆβ– | 448M/3.13G [00:07<00:46, 57.7MB/s] model.safetensors: 15%|β–ˆβ– | 464M/3.13G [00:08<00:43, 62.0MB/s] model.safetensors: 15%|β–ˆβ–Œ | 480M/3.13G [00:08<00:43, 60.9MB/s] model.safetensors: 16%|β–ˆβ–Œ | 496M/3.13G [00:08<00:59, 44.5MB/s] model.safetensors: 16%|β–ˆβ–‹ | 512M/3.13G [00:09<00:54, 48.0MB/s] model.safetensors: 17%|β–ˆβ–‹ | 528M/3.13G [00:09<00:49, 53.0MB/s] model.safetensors: 17%|β–ˆβ–‹ | 544M/3.13G [00:09<00:48, 53.8MB/s] model.safetensors: 18%|β–ˆβ–Š | 560M/3.13G [00:10<01:09, 36.8MB/s] model.safetensors: 18%|β–ˆβ–Š | 576M/3.13G [00:10<01:02, 40.8MB/s] model.safetensors: 19%|β–ˆβ–‰ | 592M/3.13G [00:11<00:56, 45.1MB/s] model.safetensors: 19%|β–ˆβ–‰ | 608M/3.13G [00:11<00:50, 50.1MB/s] model.safetensors: 20%|β–ˆβ–‰ | 624M/3.13G [00:11<00:51, 49.1MB/s] model.safetensors: 20%|β–ˆβ–ˆ | 640M/3.13G [00:11<00:47, 52.2MB/s] model.safetensors: 21%|β–ˆβ–ˆ | 656M/3.13G [00:12<00:49, 49.7MB/s] model.safetensors: 21%|β–ˆβ–ˆβ– | 672M/3.13G [00:12<00:50, 48.9MB/s] model.safetensors: 22%|β–ˆβ–ˆβ– | 688M/3.13G [00:12<00:46, 52.7MB/s] model.safetensors: 22%|β–ˆβ–ˆβ– | 704M/3.13G [00:13<00:45, 53.8MB/s] model.safetensors: 23%|β–ˆβ–ˆβ–Ž | 720M/3.13G [00:13<00:43, 55.1MB/s] model.safetensors: 23%|β–ˆβ–ˆβ–Ž | 736M/3.13G [00:17<03:41, 10.8MB/s] model.safetensors: 24%|β–ˆβ–ˆβ– | 745M/3.13G [00:17<03:02, 13.1MB/s] model.safetensors: 24%|β–ˆβ–ˆβ– | 752M/3.13G [00:17<02:44, 14.5MB/s] model.safetensors: 25%|β–ˆβ–ˆβ– | 768M/3.13G [00:18<02:04, 19.0MB/s] model.safetensors: 25%|β–ˆβ–ˆβ–Œ | 784M/3.13G [00:18<01:34, 24.8MB/s] model.safetensors: 26%|β–ˆβ–ˆβ–Œ | 800M/3.13G [00:18<01:14, 31.1MB/s] model.safetensors: 26%|β–ˆβ–ˆβ–Œ | 816M/3.13G [00:19<01:02, 37.3MB/s] model.safetensors: 27%|β–ˆβ–ˆβ–‹ | 832M/3.13G [00:19<00:57, 39.9MB/s] model.safetensors: 27%|β–ˆβ–ˆβ–‹ | 848M/3.13G [00:19<00:50, 45.7MB/s] model.safetensors: 28%|β–ˆβ–ˆβ–Š | 864M/3.13G [00:19<00:43, 52.7MB/s] model.safetensors: 28%|β–ˆβ–ˆβ–Š | 880M/3.13G [00:20<00:42, 53.0MB/s] model.safetensors: 29%|β–ˆβ–ˆβ–Š | 896M/3.13G [00:20<00:40, 55.6MB/s] model.safetensors: 29%|β–ˆβ–ˆβ–‰ | 912M/3.13G [00:20<00:38, 57.8MB/s] model.safetensors: 30%|β–ˆβ–ˆβ–‰ | 928M/3.13G [00:20<00:37, 58.2MB/s] model.safetensors: 30%|β–ˆβ–ˆβ–ˆ | 944M/3.13G [00:21<00:35, 62.1MB/s] model.safetensors: 31%|β–ˆβ–ˆβ–ˆ | 960M/3.13G [00:21<00:34, 62.3MB/s] model.safetensors: 31%|β–ˆβ–ˆβ–ˆ | 976M/3.13G [00:21<00:35, 60.3MB/s] model.safetensors: 32%|β–ˆβ–ˆβ–ˆβ– | 992M/3.13G [00:22<00:35, 59.6MB/s] model.safetensors: 32%|β–ˆβ–ˆβ–ˆβ– | 1.01G/3.13G [00:22<00:34, 61.8MB/s] model.safetensors: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1.02G/3.13G [00:22<00:34, 60.4MB/s] model.safetensors: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1.04G/3.13G [00:22<00:34, 60.9MB/s] model.safetensors: 34%|β–ˆβ–ˆβ–ˆβ–Ž | 1.06G/3.13G [00:22<00:31, 66.4MB/s] model.safetensors: 34%|β–ˆβ–ˆβ–ˆβ– | 1.07G/3.13G [00:23<00:35, 58.3MB/s] model.safetensors: 35%|β–ˆβ–ˆβ–ˆβ– | 1.09G/3.13G [00:23<00:34, 60.0MB/s] model.safetensors: 35%|β–ˆβ–ˆβ–ˆβ–Œ | 1.10G/3.13G [00:23<00:34, 59.1MB/s] model.safetensors: 36%|β–ˆβ–ˆβ–ˆβ–Œ | 1.12G/3.13G [00:24<00:32, 61.0MB/s] model.safetensors: 36%|β–ˆβ–ˆβ–ˆβ–‹ | 1.14G/3.13G [00:24<00:32, 61.5MB/s] model.safetensors: 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1.15G/3.13G [00:24<00:37, 52.4MB/s] model.safetensors: 37%|β–ˆβ–ˆβ–ˆβ–‹ | 1.17G/3.13G [00:25<00:35, 56.1MB/s] model.safetensors: 38%|β–ˆβ–ˆβ–ˆβ–Š | 1.18G/3.13G [00:25<00:42, 46.2MB/s] model.safetensors: 38%|β–ˆβ–ˆβ–ˆβ–Š | 1.20G/3.13G [00:25<00:38, 50.1MB/s] model.safetensors: 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1.22G/3.13G [00:26<00:36, 52.8MB/s] model.safetensors: 39%|β–ˆβ–ˆβ–ˆβ–‰ | 1.23G/3.13G [00:26<00:36, 52.0MB/s] model.safetensors: 40%|β–ˆβ–ˆβ–ˆβ–‰ | 1.25G/3.13G [00:26<00:34, 54.1MB/s] model.safetensors: 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1.26G/3.13G [00:26<00:32, 56.6MB/s] model.safetensors: 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 1.28G/3.13G [00:27<00:31, 58.0MB/s] model.safetensors: 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1.30G/3.13G [00:27<00:31, 58.3MB/s] model.safetensors: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1.31G/3.13G [00:27<00:29, 62.2MB/s] model.safetensors: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1.33G/3.13G [00:27<00:29, 60.6MB/s] model.safetensors: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.34G/3.13G [00:28<00:29, 60.2MB/s] model.safetensors: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.36G/3.13G [00:28<00:29, 60.4MB/s] model.safetensors: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1.38G/3.13G [00:29<00:40, 43.0MB/s] model.safetensors: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1.39G/3.13G [00:29<00:38, 45.5MB/s] model.safetensors: 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 1.41G/3.13G [00:29<00:34, 49.6MB/s] model.safetensors: 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.42G/3.13G [00:29<00:34, 48.9MB/s] model.safetensors: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.44G/3.13G [00:30<00:30, 55.2MB/s] model.safetensors: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.46G/3.13G [00:30<00:29, 56.8MB/s] model.safetensors: 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.47G/3.13G [00:30<00:28, 58.1MB/s] model.safetensors: 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.49G/3.13G [00:30<00:27, 59.7MB/s] model.safetensors: 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.50G/3.13G [00:31<00:30, 54.2MB/s] model.safetensors: 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.52G/3.13G [00:31<00:35, 46.1MB/s] model.safetensors: 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.54G/3.13G [00:32<00:32, 49.7MB/s] model.safetensors: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.55G/3.13G [00:32<00:30, 51.6MB/s] model.safetensors: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.57G/3.13G [00:32<00:29, 53.3MB/s] model.safetensors: 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.58G/3.13G [00:32<00:28, 54.2MB/s] model.safetensors: 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.60G/3.13G [00:33<00:27, 56.4MB/s] model.safetensors: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.62G/3.13G [00:33<00:26, 57.2MB/s] model.safetensors: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.63G/3.13G [00:33<00:26, 56.1MB/s] model.safetensors: 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.65G/3.13G [00:33<00:25, 58.6MB/s] model.safetensors: 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.66G/3.13G [00:34<00:25, 57.5MB/s] model.safetensors: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.68G/3.13G [00:34<00:24, 59.0MB/s] model.safetensors: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.70G/3.13G [00:34<00:23, 60.8MB/s] model.safetensors: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.71G/3.13G [00:35<00:33, 42.0MB/s] model.safetensors: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.73G/3.13G [00:35<00:34, 41.1MB/s] model.safetensors: 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.74G/3.13G [00:36<00:29, 46.4MB/s] model.safetensors: 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1.76G/3.13G [00:36<00:26, 51.5MB/s] model.safetensors: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.78G/3.13G [00:36<00:24, 54.4MB/s] model.safetensors: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1.79G/3.13G [00:36<00:23, 56.6MB/s] model.safetensors: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.81G/3.13G [00:37<00:22, 58.5MB/s] model.safetensors: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.82G/3.13G [00:37<00:21, 60.9MB/s] model.safetensors: 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1.84G/3.13G [00:37<00:19, 65.2MB/s] model.safetensors: 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.86G/3.13G [00:38<00:31, 40.5MB/s] model.safetensors: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1.87G/3.13G [00:38<00:27, 45.3MB/s] model.safetensors: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.89G/3.13G [00:38<00:25, 48.9MB/s] model.safetensors: 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1.90G/3.13G [00:38<00:23, 51.6MB/s] model.safetensors: 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.92G/3.13G [00:39<00:22, 53.8MB/s] model.safetensors: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.94G/3.13G [00:39<00:22, 52.4MB/s] model.safetensors: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1.95G/3.13G [00:39<00:21, 55.9MB/s] model.safetensors: 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.97G/3.13G [00:40<00:20, 57.0MB/s] model.safetensors: 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1.98G/3.13G [00:40<00:19, 58.5MB/s] model.safetensors: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2.00G/3.13G [00:40<00:19, 58.3MB/s] model.safetensors: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2.02G/3.13G [00:40<00:19, 58.2MB/s] model.safetensors: 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2.03G/3.13G [00:41<00:17, 61.9MB/s] model.safetensors: 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2.05G/3.13G [00:41<00:19, 54.8MB/s] model.safetensors: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2.06G/3.13G [00:41<00:18, 56.9MB/s] model.safetensors: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2.08G/3.13G [00:42<00:18, 56.4MB/s] model.safetensors: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2.10G/3.13G [00:42<00:19, 52.0MB/s] model.safetensors: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2.11G/3.13G [00:42<00:19, 53.1MB/s] model.safetensors: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2.13G/3.13G [00:42<00:18, 55.1MB/s] model.safetensors: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2.14G/3.13G [00:43<00:17, 57.2MB/s] model.safetensors: 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2.16G/3.13G [00:43<00:21, 45.4MB/s] model.safetensors: 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2.18G/3.13G [00:44<00:19, 48.5MB/s] model.safetensors: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2.19G/3.13G [00:44<00:18, 51.7MB/s] model.safetensors: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2.21G/3.13G [00:44<00:16, 55.3MB/s] model.safetensors: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2.22G/3.13G [00:44<00:16, 55.1MB/s] model.safetensors: 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2.24G/3.13G [00:45<00:15, 56.5MB/s] model.safetensors: 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2.26G/3.13G [00:45<00:15, 57.7MB/s] model.safetensors: 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2.27G/3.13G [00:45<00:20, 41.8MB/s] model.safetensors: 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2.29G/3.13G [00:46<00:22, 37.5MB/s] model.safetensors: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2.30G/3.13G [00:46<00:19, 41.9MB/s] model.safetensors: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2.32G/3.13G [00:47<00:18, 43.9MB/s] model.safetensors: 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2.34G/3.13G [00:47<00:16, 47.9MB/s] model.safetensors: 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2.35G/3.13G [00:47<00:15, 51.7MB/s] model.safetensors: 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2.37G/3.13G [00:48<00:23, 32.6MB/s] model.safetensors: 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2.38G/3.13G [00:48<00:20, 37.2MB/s] model.safetensors: 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2.40G/3.13G [00:49<00:18, 40.5MB/s] model.safetensors: 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2.42G/3.13G [00:49<00:16, 43.5MB/s] model.safetensors: 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2.43G/3.13G [00:49<00:14, 47.8MB/s] model.safetensors: 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2.45G/3.13G [00:49<00:13, 50.7MB/s] model.safetensors: 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2.46G/3.13G [00:50<00:12, 53.2MB/s] model.safetensors: 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2.48G/3.13G [00:50<00:12, 53.1MB/s] model.safetensors: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2.50G/3.13G [00:50<00:12, 53.0MB/s] model.safetensors: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2.51G/3.13G [00:51<00:11, 53.4MB/s] model.safetensors: 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2.53G/3.13G [00:51<00:10, 56.8MB/s] model.safetensors: 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2.54G/3.13G [00:51<00:10, 58.6MB/s] model.safetensors: 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2.56G/3.13G [00:51<00:09, 60.3MB/s] model.safetensors: 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2.58G/3.13G [00:52<00:08, 63.6MB/s] model.safetensors: 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2.59G/3.13G [00:52<00:08, 63.5MB/s] model.safetensors: 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 2.61G/3.13G [00:52<00:08, 61.8MB/s] model.safetensors: 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2.62G/3.13G [00:52<00:08, 60.4MB/s] model.safetensors: 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2.64G/3.13G [00:53<00:08, 61.3MB/s] model.safetensors: 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2.66G/3.13G [00:53<00:07, 61.7MB/s] model.safetensors: 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2.67G/3.13G [00:53<00:07, 60.1MB/s] model.safetensors: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 2.69G/3.13G [00:53<00:07, 57.3MB/s] model.safetensors: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2.70G/3.13G [00:54<00:07, 54.9MB/s] model.safetensors: 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2.72G/3.13G [00:54<00:06, 59.5MB/s] model.safetensors: 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 2.74G/3.13G [00:54<00:06, 60.2MB/s] model.safetensors: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2.75G/3.13G [00:55<00:06, 59.9MB/s] model.safetensors: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2.77G/3.13G [00:55<00:05, 61.7MB/s] model.safetensors: 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2.78G/3.13G [00:55<00:05, 60.8MB/s] model.safetensors: 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2.80G/3.13G [00:55<00:05, 65.7MB/s] model.safetensors: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 2.82G/3.13G [00:56<00:04, 64.5MB/s] model.safetensors: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2.83G/3.13G [00:56<00:05, 58.5MB/s] model.safetensors: 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2.85G/3.13G [00:56<00:04, 59.6MB/s] model.safetensors: 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 2.86G/3.13G [00:56<00:04, 58.4MB/s] model.safetensors: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 2.88G/3.13G [00:57<00:04, 59.9MB/s] model.safetensors: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 2.90G/3.13G [00:57<00:03, 64.9MB/s] model.safetensors: 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 2.91G/3.13G [00:57<00:04, 44.3MB/s] model.safetensors: 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 2.93G/3.13G [00:58<00:04, 47.8MB/s] model.safetensors: 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 2.94G/3.13G [00:58<00:03, 51.5MB/s] model.safetensors: 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 2.96G/3.13G [00:58<00:03, 56.2MB/s] model.safetensors: 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 2.98G/3.13G [00:59<00:02, 57.0MB/s] model.safetensors: 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 2.99G/3.13G [00:59<00:02, 59.5MB/s] model.safetensors: 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 3.01G/3.13G [00:59<00:02, 59.4MB/s] model.safetensors: 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3.02G/3.13G [00:59<00:01, 58.0MB/s] model.safetensors: 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 3.04G/3.13G [01:00<00:01, 58.0MB/s] model.safetensors: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3.06G/3.13G [01:00<00:01, 60.3MB/s] model.safetensors: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3.07G/3.13G [01:00<00:01, 58.3MB/s] model.safetensors: 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 3.09G/3.13G [01:00<00:00, 58.8MB/s] model.safetensors: 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3.10G/3.13G [01:01<00:00, 61.4MB/s] model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3.12G/3.13G [01:01<00:00, 60.7MB/s] model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3.13G/3.13G [01:01<00:00, 50.8MB/s]
Upload 3 LFS files: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 1/3 [01:01<02:03, 61.90s/it] Upload 3 LFS files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [01:01<00:00, 20.63s/it]