Instructions to use YWZBrandon/google_t5-v1_1-large_ds100_upsample1000 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use YWZBrandon/google_t5-v1_1-large_ds100_upsample1000 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("YWZBrandon/google_t5-v1_1-large_ds100_upsample1000") model = AutoModelForSeq2SeqLM.from_pretrained("YWZBrandon/google_t5-v1_1-large_ds100_upsample1000") - Notebooks
- Google Colab
- Kaggle
File size: 75,605 Bytes
67a170a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | [2025-05-13 01:08:08] Created output directory: train_results/google_t5-v1_1-large_ds100_upsample1000
[2025-05-13 01:08:08] Chat mode disabled
[2025-05-13 01:08:08] Model size is 3B or smaller (0 B). Using full fine-tuning.
[2025-05-13 01:08:08] Adjusted parameters for t5 model:
[2025-05-13 01:08:08] - LEARNING_RATE: 1e-4
[2025-05-13 01:08:08] - BATCH_SIZE: 64
[2025-05-13 01:08:08] - GRADIENT_ACCUMULATION_STEPS: 1
[2025-05-13 01:08:08] No QA format data will be used
[2025-05-13 01:08:08] Limiting dataset size to: 100 samples
[2025-05-13 01:08:08] =======================================
[2025-05-13 01:08:08] Starting training for model: google/t5-v1_1-large
[2025-05-13 01:08:08] =======================================
[2025-05-13 01:08:08] CUDA_VISIBLE_DEVICES: 2,3
[2025-05-13 01:08:08] WANDB_PROJECT: wikidyk-ar
[2025-05-13 01:08:08] DATA_PATH: data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json
[2025-05-13 01:08:08] Global Batch Size: 128
[2025-05-13 01:08:08] Data Size: 100
[2025-05-13 01:08:08] Executing command: torchrun --nproc_per_node "2" --master-port 29501 src/train.py --model_name_or_path "google/t5-v1_1-large" --data_path "data/wikidyk2022-2025_01082025_gpt-4o_evalv2_pages_formatted_combined_v2.json" --output_dir "train_results/google_t5-v1_1-large_ds100_upsample1000" --num_upsample "1000" --per_device_train_batch_size "64" --gradient_accumulation_steps "1" --learning_rate "1e-4" --num_train_epochs "1" --model_max_length "32768" --report_to wandb --logging_steps 50 --save_strategy no --bf16 True --use_flash_attention_2 True --qa_data_ratio "-1" --predict_mask "false" --ds_size 100
[2025-05-13 01:08:08] Training started at Tue May 13 01:08:08 UTC 2025
W0513 01:08:10.171000 443637 site-packages/torch/distributed/run.py:792]
W0513 01:08:10.171000 443637 site-packages/torch/distributed/run.py:792] *****************************************
W0513 01:08:10.171000 443637 site-packages/torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0513 01:08:10.171000 443637 site-packages/torch/distributed/run.py:792] *****************************************
WARNING:root:Output directory: train_results/google_t5-v1_1-large_ds100_upsample1000
WARNING:root:Output directory: train_results/google_t5-v1_1-large_ds100_upsample1000
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
WARNING:root:Loading data...
WARNING:root:Loading data...
WARNING:root:Dataset initialized with all QA data:
WARNING:root: - 0 QA examples
WARNING:root: - 100 fact examples with upsampling factor 1000
WARNING:root: - Total examples: 100000
WARNING:root:Dataset initialized with all QA data:
WARNING:root: - 0 QA examples
WARNING:root: - 100 fact examples with upsampling factor 1000
WARNING:root: - Total examples: 100000
/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
/root/yuwei/WikiDYKEvalV2/src/train.py:119: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead.
trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
warnings.warn(
wandb: Currently logged in as: yuweiz to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
[rank1]:[W513 01:08:40.694530638 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
wandb: Tracking run with wandb version 0.19.11
wandb: Run data is saved locally in /root/yuwei/WikiDYKEvalV2/wandb/run-20250513_010840-ovxz0tx6
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run train_results/google_t5-v1_1-large_ds100_upsample1000
wandb: βοΈ View project at https://wandb.ai/yuweiz/wikidyk-ar
wandb: π View run at https://wandb.ai/yuweiz/wikidyk-ar/runs/ovxz0tx6
0%| | 0/782 [00:00<?, ?it/s]/root/miniconda3/envs/wikidyk/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:3980: UserWarning: `as_target_tokenizer` is deprecated and will be removed in v5 of Transformers. You can tokenize your labels by using the argument `text_target` of the regular `__call__` method (either in the same call as your input texts if you use the same keyword arguments, or in a separate call.
warnings.warn(
Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
[rank0]:[W513 01:08:40.426723468 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
0%| | 1/782 [00:00<11:29, 1.13it/s]
0%| | 2/782 [00:02<15:37, 1.20s/it]
0%| | 3/782 [00:02<11:24, 1.14it/s]
1%| | 4/782 [00:03<09:45, 1.33it/s]
1%| | 5/782 [00:03<08:53, 1.46it/s]
1%| | 6/782 [00:04<07:57, 1.62it/s]
1%| | 7/782 [00:05<10:15, 1.26it/s]
1%| | 8/782 [00:06<09:03, 1.42it/s]
1%| | 9/782 [00:06<08:31, 1.51it/s]
1%|β | 10/782 [00:07<08:10, 1.57it/s]
1%|β | 11/782 [00:07<07:35, 1.69it/s]
2%|β | 12/782 [00:08<07:17, 1.76it/s]
2%|β | 13/782 [00:08<07:22, 1.74it/s]
2%|β | 14/782 [00:09<07:07, 1.80it/s]
2%|β | 15/782 [00:09<06:59, 1.83it/s]
2%|β | 16/782 [00:10<06:50, 1.87it/s]
2%|β | 17/782 [00:10<06:54, 1.85it/s]
2%|β | 18/782 [00:11<06:49, 1.87it/s]
2%|β | 19/782 [00:11<06:42, 1.90it/s]
3%|β | 20/782 [00:12<06:51, 1.85it/s]
3%|β | 21/782 [00:13<06:52, 1.85it/s]
3%|β | 22/782 [00:13<06:37, 1.91it/s]
3%|β | 23/782 [00:14<06:56, 1.82it/s]
3%|β | 24/782 [00:14<06:59, 1.81it/s]
3%|β | 25/782 [00:15<06:37, 1.90it/s]
3%|β | 26/782 [00:15<06:40, 1.89it/s]
3%|β | 27/782 [00:16<06:50, 1.84it/s]
4%|β | 28/782 [00:16<06:48, 1.85it/s]
4%|β | 29/782 [00:17<06:40, 1.88it/s]
4%|β | 30/782 [00:17<06:41, 1.87it/s]
4%|β | 31/782 [00:18<06:38, 1.89it/s]
4%|β | 32/782 [00:18<06:35, 1.89it/s]
4%|β | 33/782 [00:19<06:40, 1.87it/s]
4%|β | 34/782 [00:19<06:34, 1.89it/s]
4%|β | 35/782 [00:20<06:34, 1.90it/s]
5%|β | 36/782 [00:21<06:40, 1.86it/s]
5%|β | 37/782 [00:21<06:29, 1.91it/s]
5%|β | 38/782 [00:22<08:52, 1.40it/s]
5%|β | 39/782 [00:23<08:07, 1.52it/s]
5%|β | 40/782 [00:23<07:41, 1.61it/s]
5%|β | 41/782 [00:24<07:14, 1.71it/s]
5%|β | 42/782 [00:24<06:50, 1.80it/s]
5%|β | 43/782 [00:25<06:53, 1.79it/s]
6%|β | 44/782 [00:25<07:05, 1.74it/s]
6%|β | 45/782 [00:26<06:50, 1.80it/s]
6%|β | 46/782 [00:27<06:47, 1.81it/s]
6%|β | 47/782 [00:27<06:47, 1.80it/s]
6%|β | 48/782 [00:28<06:53, 1.78it/s]
6%|β | 49/782 [00:28<06:33, 1.86it/s]
6%|β | 50/782 [00:29<06:33, 1.86it/s]
{'loss': 28.359, 'grad_norm': 74.1916732788086, 'learning_rate': 9.373401534526855e-05, 'epoch': 0.06}
6%|β | 50/782 [00:29<06:33, 1.86it/s]
7%|β | 51/782 [00:29<06:53, 1.77it/s]
7%|β | 52/782 [00:30<06:49, 1.78it/s]
7%|β | 53/782 [00:30<06:34, 1.85it/s]
7%|β | 54/782 [00:31<06:31, 1.86it/s]
7%|β | 55/782 [00:32<06:50, 1.77it/s]
7%|β | 56/782 [00:32<06:42, 1.80it/s]
7%|β | 57/782 [00:33<06:32, 1.85it/s]
7%|β | 58/782 [00:33<06:25, 1.88it/s]
8%|β | 59/782 [00:34<06:32, 1.84it/s]
8%|β | 60/782 [00:34<06:28, 1.86it/s]
8%|β | 61/782 [00:35<06:18, 1.90it/s]
8%|β | 62/782 [00:36<08:41, 1.38it/s]
8%|β | 63/782 [00:36<08:00, 1.50it/s]
8%|β | 64/782 [00:37<07:31, 1.59it/s]
8%|β | 65/782 [00:37<06:59, 1.71it/s]
8%|β | 66/782 [00:38<06:56, 1.72it/s]
9%|β | 67/782 [00:39<06:52, 1.73it/s]
9%|β | 68/782 [00:39<06:35, 1.81it/s]
9%|β | 69/782 [00:40<06:38, 1.79it/s]
9%|β | 70/782 [00:40<06:58, 1.70it/s]
9%|β | 71/782 [00:41<06:36, 1.79it/s]
9%|β | 72/782 [00:41<06:26, 1.84it/s]
9%|β | 73/782 [00:42<06:22, 1.85it/s]
9%|β | 74/782 [00:42<06:24, 1.84it/s]
10%|β | 75/782 [00:43<06:24, 1.84it/s]
10%|β | 76/782 [00:43<06:30, 1.81it/s]
10%|β | 77/782 [00:44<06:43, 1.75it/s]
10%|β | 78/782 [00:45<06:40, 1.76it/s]
10%|β | 79/782 [00:45<06:28, 1.81it/s]
10%|β | 80/782 [00:46<06:27, 1.81it/s]
10%|β | 81/782 [00:46<06:33, 1.78it/s]
10%|β | 82/782 [00:47<06:17, 1.85it/s]
11%|β | 83/782 [00:47<06:09, 1.89it/s]
11%|β | 84/782 [00:48<06:09, 1.89it/s]
11%|β | 85/782 [00:48<06:16, 1.85it/s]
11%|β | 86/782 [00:49<06:16, 1.85it/s]
11%|β | 87/782 [00:49<06:07, 1.89it/s]
11%|ββ | 88/782 [00:50<06:10, 1.87it/s]
11%|ββ | 89/782 [00:50<06:07, 1.89it/s]
12%|ββ | 90/782 [00:51<06:02, 1.91it/s]
12%|ββ | 91/782 [00:52<06:33, 1.76it/s]
12%|ββ | 92/782 [00:53<09:21, 1.23it/s]
12%|ββ | 93/782 [00:54<08:28, 1.35it/s]
12%|ββ | 94/782 [00:54<07:33, 1.52it/s]
12%|ββ | 95/782 [00:55<07:15, 1.58it/s]
12%|ββ | 96/782 [00:55<07:19, 1.56it/s]
12%|ββ | 97/782 [00:56<06:48, 1.68it/s]
13%|ββ | 98/782 [00:56<06:40, 1.71it/s]
13%|ββ | 99/782 [00:57<06:28, 1.76it/s]
13%|ββ | 100/782 [00:57<06:23, 1.78it/s]
{'loss': 19.0603, 'grad_norm': 20.285858154296875, 'learning_rate': 8.734015345268543e-05, 'epoch': 0.13}
13%|ββ | 100/782 [00:58<06:23, 1.78it/s]
13%|ββ | 101/782 [00:58<06:15, 1.81it/s]
13%|ββ | 102/782 [00:59<06:08, 1.84it/s]
13%|ββ | 103/782 [00:59<06:12, 1.82it/s]
13%|ββ | 104/782 [01:00<06:14, 1.81it/s]
13%|ββ | 105/782 [01:00<06:01, 1.87it/s]
14%|ββ | 106/782 [01:01<05:58, 1.88it/s]
14%|ββ | 107/782 [01:01<06:06, 1.84it/s]
14%|ββ | 108/782 [01:02<05:53, 1.91it/s]
14%|ββ | 109/782 [01:02<05:59, 1.87it/s]
14%|ββ | 110/782 [01:03<06:05, 1.84it/s]
14%|ββ | 111/782 [01:03<05:54, 1.89it/s]
14%|ββ | 112/782 [01:04<05:57, 1.87it/s]
14%|ββ | 113/782 [01:04<06:02, 1.85it/s]
15%|ββ | 114/782 [01:05<05:52, 1.89it/s]
15%|ββ | 115/782 [01:05<05:59, 1.85it/s]
15%|ββ | 116/782 [01:06<06:02, 1.84it/s]
15%|ββ | 117/782 [01:07<05:49, 1.90it/s]
15%|ββ | 118/782 [01:07<05:57, 1.86it/s]
15%|ββ | 119/782 [01:08<06:02, 1.83it/s]
15%|ββ | 120/782 [01:08<05:54, 1.87it/s]
15%|ββ | 121/782 [01:09<05:52, 1.88it/s]
16%|ββ | 122/782 [01:09<06:00, 1.83it/s]
16%|ββ | 123/782 [01:10<05:54, 1.86it/s]
16%|ββ | 124/782 [01:10<05:49, 1.89it/s]
16%|ββ | 125/782 [01:11<05:45, 1.90it/s]
16%|ββ | 126/782 [01:11<05:49, 1.88it/s]
16%|ββ | 127/782 [01:12<05:41, 1.92it/s]
16%|ββ | 128/782 [01:12<05:33, 1.96it/s]
16%|ββ | 129/782 [01:13<05:43, 1.90it/s]
17%|ββ | 130/782 [01:13<05:48, 1.87it/s]
17%|ββ | 131/782 [01:14<05:36, 1.94it/s]
17%|ββ | 132/782 [01:14<05:44, 1.89it/s]
17%|ββ | 133/782 [01:15<05:58, 1.81it/s]
17%|ββ | 134/782 [01:16<05:47, 1.87it/s]
17%|ββ | 135/782 [01:16<05:53, 1.83it/s]
17%|ββ | 136/782 [01:17<06:16, 1.72it/s]
18%|ββ | 137/782 [01:17<05:56, 1.81it/s]
18%|ββ | 138/782 [01:18<05:55, 1.81it/s]
18%|ββ | 139/782 [01:18<06:02, 1.77it/s]
18%|ββ | 140/782 [01:19<05:45, 1.86it/s]
18%|ββ | 141/782 [01:19<05:41, 1.88it/s]
18%|ββ | 142/782 [01:20<05:41, 1.88it/s]
18%|ββ | 143/782 [01:20<05:35, 1.91it/s]
18%|ββ | 144/782 [01:21<05:34, 1.91it/s]
19%|ββ | 145/782 [01:22<05:32, 1.92it/s]
19%|ββ | 146/782 [01:22<05:53, 1.80it/s]
19%|ββ | 147/782 [01:23<05:42, 1.86it/s]
19%|ββ | 148/782 [01:23<05:40, 1.86it/s]
19%|ββ | 149/782 [01:24<05:44, 1.84it/s]
19%|ββ | 150/782 [01:24<05:44, 1.84it/s]
{'loss': 15.0203, 'grad_norm': 185.91448974609375, 'learning_rate': 8.094629156010231e-05, 'epoch': 0.19}
19%|ββ | 150/782 [01:24<05:44, 1.84it/s]
19%|ββ | 151/782 [01:25<05:36, 1.87it/s]
19%|ββ | 152/782 [01:25<05:39, 1.86it/s]
20%|ββ | 153/782 [01:26<05:49, 1.80it/s]
20%|ββ | 154/782 [01:26<05:34, 1.88it/s]
20%|ββ | 155/782 [01:27<05:42, 1.83it/s]
20%|ββ | 156/782 [01:28<05:57, 1.75it/s]
20%|ββ | 157/782 [01:28<05:43, 1.82it/s]
20%|ββ | 158/782 [01:29<05:38, 1.85it/s]
20%|ββ | 159/782 [01:29<05:47, 1.79it/s]
20%|ββ | 160/782 [01:30<05:37, 1.84it/s]
21%|ββ | 161/782 [01:30<05:37, 1.84it/s]
21%|ββ | 162/782 [01:31<05:53, 1.75it/s]
21%|ββ | 163/782 [01:31<05:43, 1.80it/s]
21%|ββ | 164/782 [01:32<05:34, 1.85it/s]
21%|ββ | 165/782 [01:32<05:27, 1.88it/s]
21%|ββ | 166/782 [01:33<05:31, 1.86it/s]
21%|βββ | 167/782 [01:34<05:51, 1.75it/s]
21%|βββ | 168/782 [01:34<05:29, 1.86it/s]
22%|βββ | 169/782 [01:35<05:31, 1.85it/s]
22%|βββ | 170/782 [01:35<05:39, 1.80it/s]
22%|βββ | 171/782 [01:36<05:23, 1.89it/s]
22%|βββ | 172/782 [01:36<05:27, 1.86it/s]
22%|βββ | 173/782 [01:37<05:29, 1.85it/s]
22%|βββ | 174/782 [01:37<05:31, 1.84it/s]
22%|βββ | 175/782 [01:38<05:23, 1.88it/s]
23%|βββ | 176/782 [01:38<05:26, 1.86it/s]
23%|βββ | 177/782 [01:39<05:23, 1.87it/s]
23%|βββ | 178/782 [01:40<05:23, 1.86it/s]
23%|βββ | 179/782 [01:40<05:17, 1.90it/s]
23%|βββ | 180/782 [01:41<05:24, 1.86it/s]
23%|βββ | 181/782 [01:41<05:21, 1.87it/s]
23%|βββ | 182/782 [01:42<05:21, 1.87it/s]
23%|βββ | 183/782 [01:42<05:22, 1.86it/s]
24%|βββ | 184/782 [01:43<05:19, 1.87it/s]
24%|βββ | 185/782 [01:43<05:20, 1.86it/s]
24%|βββ | 186/782 [01:44<05:28, 1.82it/s]
24%|βββ | 187/782 [01:44<05:18, 1.87it/s]
24%|βββ | 188/782 [01:45<05:15, 1.89it/s]
24%|βββ | 189/782 [01:45<05:16, 1.87it/s]
24%|βββ | 190/782 [01:46<05:16, 1.87it/s]
24%|βββ | 191/782 [01:46<05:12, 1.89it/s]
25%|βββ | 192/782 [01:47<05:29, 1.79it/s]
25%|βββ | 193/782 [01:48<05:26, 1.81it/s]
25%|βββ | 194/782 [01:48<05:18, 1.84it/s]
25%|βββ | 195/782 [01:49<05:22, 1.82it/s]
25%|βββ | 196/782 [01:49<05:20, 1.83it/s]
25%|βββ | 197/782 [01:50<05:10, 1.88it/s]
25%|βββ | 198/782 [01:50<05:15, 1.85it/s]
25%|βββ | 199/782 [01:51<05:31, 1.76it/s]
26%|βββ | 200/782 [01:51<05:20, 1.81it/s]
{'loss': 13.3998, 'grad_norm': 26.281652450561523, 'learning_rate': 7.455242966751919e-05, 'epoch': 0.26}
26%|βββ | 200/782 [01:52<05:20, 1.81it/s]
26%|βββ | 201/782 [01:52<05:17, 1.83it/s]
26%|βββ | 202/782 [01:53<05:25, 1.78it/s]
26%|βββ | 203/782 [01:53<05:12, 1.85it/s]
26%|βββ | 204/782 [01:54<05:13, 1.84it/s]
26%|βββ | 205/782 [01:54<05:23, 1.79it/s]
26%|βββ | 206/782 [01:55<05:07, 1.87it/s]
26%|βββ | 207/782 [01:55<05:09, 1.86it/s]
27%|βββ | 208/782 [01:56<05:20, 1.79it/s]
27%|βββ | 209/782 [01:56<05:12, 1.83it/s]
27%|βββ | 210/782 [01:57<05:05, 1.87it/s]
27%|βββ | 211/782 [01:57<05:03, 1.88it/s]
27%|βββ | 212/782 [01:58<05:05, 1.86it/s]
27%|βββ | 213/782 [01:59<05:15, 1.80it/s]
27%|βββ | 214/782 [01:59<05:02, 1.88it/s]
27%|βββ | 215/782 [02:00<05:07, 1.85it/s]
28%|βββ | 216/782 [02:00<05:14, 1.80it/s]
28%|βββ | 217/782 [02:01<05:02, 1.87it/s]
28%|βββ | 218/782 [02:01<05:03, 1.86it/s]
28%|βββ | 219/782 [02:02<05:09, 1.82it/s]
28%|βββ | 220/782 [02:02<04:59, 1.88it/s]
28%|βββ | 221/782 [02:03<04:57, 1.89it/s]
28%|βββ | 222/782 [02:03<04:51, 1.92it/s]
29%|βββ | 223/782 [02:04<04:56, 1.89it/s]
29%|βββ | 224/782 [02:04<04:56, 1.88it/s]
29%|βββ | 225/782 [02:05<04:48, 1.93it/s]
29%|βββ | 226/782 [02:05<04:55, 1.88it/s]
29%|βββ | 227/782 [02:06<05:02, 1.83it/s]
29%|βββ | 228/782 [02:07<04:53, 1.89it/s]
29%|βββ | 229/782 [02:07<04:54, 1.87it/s]
29%|βββ | 230/782 [02:08<05:09, 1.78it/s]
30%|βββ | 231/782 [02:08<04:57, 1.85it/s]
30%|βββ | 232/782 [02:09<04:58, 1.84it/s]
30%|βββ | 233/782 [02:09<05:01, 1.82it/s]
30%|βββ | 234/782 [02:10<04:59, 1.83it/s]
30%|βββ | 235/782 [02:10<04:55, 1.85it/s]
30%|βββ | 236/782 [02:11<04:54, 1.85it/s]
30%|βββ | 237/782 [02:11<04:55, 1.84it/s]
30%|βββ | 238/782 [02:12<04:45, 1.90it/s]
31%|βββ | 239/782 [02:12<04:47, 1.89it/s]
31%|βββ | 240/782 [02:13<04:52, 1.86it/s]
31%|βββ | 241/782 [02:14<04:54, 1.84it/s]
31%|βββ | 242/782 [02:14<04:39, 1.93it/s]
31%|βββ | 243/782 [02:15<04:43, 1.90it/s]
31%|βββ | 244/782 [02:15<05:08, 1.74it/s]
31%|ββββ | 245/782 [02:16<04:53, 1.83it/s]
31%|ββββ | 246/782 [02:16<04:51, 1.84it/s]
32%|ββββ | 247/782 [02:17<04:48, 1.86it/s]
32%|ββββ | 248/782 [02:17<04:54, 1.81it/s]
32%|ββββ | 249/782 [02:18<04:53, 1.82it/s]
32%|ββββ | 250/782 [02:18<04:40, 1.90it/s]
{'loss': 12.2939, 'grad_norm': 15.767915725708008, 'learning_rate': 6.815856777493607e-05, 'epoch': 0.32}
32%|ββββ | 250/782 [02:19<04:40, 1.90it/s]
32%|ββββ | 251/782 [02:19<04:45, 1.86it/s]
32%|ββββ | 252/782 [02:20<04:51, 1.82it/s]
32%|ββββ | 253/782 [02:20<04:42, 1.87it/s]
32%|ββββ | 254/782 [02:21<04:40, 1.88it/s]
33%|ββββ | 255/782 [02:21<04:39, 1.89it/s]
33%|ββββ | 256/782 [02:22<04:37, 1.90it/s]
33%|ββββ | 257/782 [02:22<04:34, 1.91it/s]
33%|ββββ | 258/782 [02:23<04:36, 1.89it/s]
33%|ββββ | 259/782 [02:23<04:45, 1.83it/s]
33%|ββββ | 260/782 [02:24<04:37, 1.88it/s]
33%|ββββ | 261/782 [02:24<04:31, 1.92it/s]
34%|ββββ | 262/782 [02:25<04:36, 1.88it/s]
34%|ββββ | 263/782 [02:25<04:38, 1.86it/s]
34%|ββββ | 264/782 [02:26<04:37, 1.87it/s]
34%|ββββ | 265/782 [02:26<04:40, 1.84it/s]
34%|ββββ | 266/782 [02:27<04:33, 1.89it/s]
34%|ββββ | 267/782 [02:28<04:37, 1.86it/s]
34%|ββββ | 268/782 [02:28<04:43, 1.81it/s]
34%|ββββ | 269/782 [02:29<04:47, 1.78it/s]
35%|ββββ | 270/782 [02:29<04:36, 1.85it/s]
35%|ββββ | 271/782 [02:30<04:40, 1.82it/s]
35%|ββββ | 272/782 [02:30<04:49, 1.76it/s]
35%|ββββ | 273/782 [02:31<04:37, 1.84it/s]
35%|ββββ | 274/782 [02:31<04:35, 1.84it/s]
35%|ββββ | 275/782 [02:32<04:41, 1.80it/s]
35%|ββββ | 276/782 [02:32<04:30, 1.87it/s]
35%|ββββ | 277/782 [02:33<04:35, 1.83it/s]
36%|ββββ | 278/782 [02:34<04:34, 1.84it/s]
36%|ββββ | 279/782 [02:34<04:33, 1.84it/s]
36%|ββββ | 280/782 [02:35<04:33, 1.84it/s]
36%|ββββ | 281/782 [02:35<04:36, 1.81it/s]
36%|ββββ | 282/782 [02:36<04:26, 1.88it/s]
36%|ββββ | 283/782 [02:36<04:29, 1.85it/s]
36%|ββββ | 284/782 [02:37<04:38, 1.79it/s]
36%|ββββ | 285/782 [02:37<04:29, 1.85it/s]
37%|ββββ | 286/782 [02:38<04:30, 1.83it/s]
37%|ββββ | 287/782 [02:38<04:28, 1.85it/s]
37%|ββββ | 288/782 [02:39<04:22, 1.88it/s]
37%|ββββ | 289/782 [02:40<04:24, 1.87it/s]
37%|ββββ | 290/782 [02:40<04:41, 1.75it/s]
37%|ββββ | 291/782 [02:41<04:33, 1.80it/s]
37%|ββββ | 292/782 [02:41<04:29, 1.82it/s]
37%|ββββ | 293/782 [02:42<04:37, 1.76it/s]
38%|ββββ | 294/782 [02:42<04:24, 1.85it/s]
38%|ββββ | 295/782 [02:43<04:26, 1.83it/s]
38%|ββββ | 296/782 [02:43<04:32, 1.78it/s]
38%|ββββ | 297/782 [02:44<04:26, 1.82it/s]
38%|ββββ | 298/782 [02:44<04:17, 1.88it/s]
38%|ββββ | 299/782 [02:45<04:18, 1.87it/s]
38%|ββββ | 300/782 [02:46<04:16, 1.88it/s]
{'loss': 5.9866, 'grad_norm': 6.948846817016602, 'learning_rate': 6.176470588235295e-05, 'epoch': 0.38}
38%|ββββ | 300/782 [02:46<04:16, 1.88it/s]
38%|ββββ | 301/782 [02:46<04:18, 1.86it/s]
39%|ββββ | 302/782 [02:47<04:18, 1.86it/s]
39%|ββββ | 303/782 [02:47<04:20, 1.84it/s]
39%|ββββ | 304/782 [02:48<04:19, 1.84it/s]
39%|ββββ | 305/782 [02:48<04:13, 1.88it/s]
39%|ββββ | 306/782 [02:49<04:16, 1.86it/s]
39%|ββββ | 307/782 [02:49<04:10, 1.89it/s]
39%|ββββ | 308/782 [02:50<04:10, 1.89it/s]
40%|ββββ | 309/782 [02:50<04:11, 1.88it/s]
40%|ββββ | 310/782 [02:51<04:09, 1.89it/s]
40%|ββββ | 311/782 [02:51<04:07, 1.90it/s]
40%|ββββ | 312/782 [02:52<04:08, 1.89it/s]
40%|ββββ | 313/782 [02:52<04:09, 1.88it/s]
40%|ββββ | 314/782 [02:53<04:08, 1.88it/s]
40%|ββββ | 315/782 [02:54<04:15, 1.83it/s]
40%|ββββ | 316/782 [02:54<04:13, 1.84it/s]
41%|ββββ | 317/782 [02:55<04:07, 1.88it/s]
41%|ββββ | 318/782 [02:55<04:06, 1.88it/s]
41%|ββββ | 319/782 [02:56<04:03, 1.90it/s]
41%|ββββ | 320/782 [02:56<04:00, 1.92it/s]
41%|ββββ | 321/782 [02:57<04:05, 1.87it/s]
41%|ββββ | 322/782 [02:57<04:09, 1.84it/s]
41%|βββββ | 323/782 [02:58<04:05, 1.87it/s]
41%|βββββ | 324/782 [02:58<04:04, 1.88it/s]
42%|βββββ | 325/782 [02:59<04:05, 1.86it/s]
42%|βββββ | 326/782 [02:59<04:01, 1.89it/s]
42%|βββββ | 327/782 [03:00<03:59, 1.90it/s]
42%|βββββ | 328/782 [03:01<04:06, 1.84it/s]
42%|βββββ | 329/782 [03:01<04:02, 1.86it/s]
42%|βββββ | 330/782 [03:02<04:00, 1.88it/s]
42%|βββββ | 331/782 [03:02<03:59, 1.88it/s]
42%|βββββ | 332/782 [03:03<04:12, 1.78it/s]
43%|βββββ | 333/782 [03:03<04:09, 1.80it/s]
43%|βββββ | 334/782 [03:04<04:01, 1.86it/s]
43%|βββββ | 335/782 [03:04<04:05, 1.82it/s]
43%|βββββ | 336/782 [03:05<04:19, 1.72it/s]
43%|βββββ | 337/782 [03:05<04:03, 1.83it/s]
43%|βββββ | 338/782 [03:06<04:02, 1.83it/s]
43%|βββββ | 339/782 [03:07<03:59, 1.85it/s]
43%|βββββ | 340/782 [03:07<03:50, 1.91it/s]
44%|βββββ | 341/782 [03:08<03:51, 1.90it/s]
44%|βββββ | 342/782 [03:08<03:55, 1.87it/s]
44%|βββββ | 343/782 [03:09<03:53, 1.88it/s]
44%|βββββ | 344/782 [03:09<03:48, 1.91it/s]
44%|βββββ | 345/782 [03:10<03:48, 1.92it/s]
44%|βββββ | 346/782 [03:10<03:54, 1.86it/s]
44%|βββββ | 347/782 [03:11<03:55, 1.85it/s]
45%|βββββ | 348/782 [03:11<03:51, 1.88it/s]
45%|βββββ | 349/782 [03:12<04:01, 1.79it/s]
45%|βββββ | 350/782 [03:12<04:02, 1.78it/s]
{'loss': 0.1617, 'grad_norm': 0.6942098140716553, 'learning_rate': 5.537084398976983e-05, 'epoch': 0.45}
45%|βββββ | 350/782 [03:13<04:02, 1.78it/s]
45%|βββββ | 351/782 [03:13<03:53, 1.84it/s]
45%|βββββ | 352/782 [03:14<03:51, 1.86it/s]
45%|βββββ | 353/782 [03:14<03:58, 1.80it/s]
45%|βββββ | 354/782 [03:15<03:48, 1.88it/s]
45%|βββββ | 355/782 [03:15<03:50, 1.86it/s]
46%|βββββ | 356/782 [03:16<03:48, 1.86it/s]
46%|βββββ | 357/782 [03:16<03:38, 1.95it/s]
46%|βββββ | 358/782 [03:17<03:45, 1.88it/s]
46%|βββββ | 359/782 [03:17<03:51, 1.83it/s]
46%|βββββ | 360/782 [03:18<03:46, 1.86it/s]
46%|βββββ | 361/782 [03:18<03:53, 1.81it/s]
46%|βββββ | 362/782 [03:19<03:55, 1.78it/s]
46%|βββββ | 363/782 [03:19<03:48, 1.84it/s]
47%|βββββ | 364/782 [03:20<03:48, 1.83it/s]
47%|βββββ | 365/782 [03:21<03:49, 1.82it/s]
47%|βββββ | 366/782 [03:21<03:40, 1.89it/s]
47%|βββββ | 367/782 [03:22<03:36, 1.92it/s]
47%|βββββ | 368/782 [03:22<03:40, 1.88it/s]
47%|βββββ | 369/782 [03:23<03:34, 1.92it/s]
47%|βββββ | 370/782 [03:23<03:33, 1.93it/s]
47%|βββββ | 371/782 [03:24<03:34, 1.92it/s]
48%|βββββ | 372/782 [03:24<03:40, 1.86it/s]
48%|βββββ | 373/782 [03:25<03:38, 1.87it/s]
48%|βββββ | 374/782 [03:25<03:35, 1.89it/s]
48%|βββββ | 375/782 [03:26<03:37, 1.87it/s]
48%|βββββ | 376/782 [03:26<03:35, 1.88it/s]
48%|βββββ | 377/782 [03:27<03:34, 1.89it/s]
48%|βββββ | 378/782 [03:27<03:35, 1.88it/s]
48%|βββββ | 379/782 [03:28<03:31, 1.91it/s]
49%|βββββ | 380/782 [03:28<03:29, 1.91it/s]
49%|βββββ | 381/782 [03:29<03:36, 1.85it/s]
49%|βββββ | 382/782 [03:30<03:49, 1.74it/s]
49%|βββββ | 383/782 [03:30<03:38, 1.83it/s]
49%|βββββ | 384/782 [03:31<03:41, 1.79it/s]
49%|βββββ | 385/782 [03:31<03:44, 1.77it/s]
49%|βββββ | 386/782 [03:32<03:45, 1.76it/s]
49%|βββββ | 387/782 [03:32<03:39, 1.80it/s]
50%|βββββ | 388/782 [03:33<03:33, 1.84it/s]
50%|βββββ | 389/782 [03:33<03:33, 1.84it/s]
50%|βββββ | 390/782 [03:34<03:38, 1.79it/s]
50%|βββββ | 391/782 [03:35<03:28, 1.87it/s]
50%|βββββ | 392/782 [03:35<03:35, 1.81it/s]
50%|βββββ | 393/782 [03:36<03:48, 1.70it/s]
50%|βββββ | 394/782 [03:36<03:35, 1.80it/s]
51%|βββββ | 395/782 [03:37<03:34, 1.81it/s]
51%|βββββ | 396/782 [03:37<03:34, 1.80it/s]
51%|βββββ | 397/782 [03:38<03:31, 1.82it/s]
51%|βββββ | 398/782 [03:38<03:28, 1.84it/s]
51%|βββββ | 399/782 [03:39<03:31, 1.81it/s]
51%|βββββ | 400/782 [03:40<03:24, 1.87it/s]
{'loss': 0.0779, 'grad_norm': 0.5859019756317139, 'learning_rate': 4.89769820971867e-05, 'epoch': 0.51}
51%|βββββ | 400/782 [03:40<03:24, 1.87it/s]
51%|ββββββ | 401/782 [03:40<03:27, 1.84it/s]
51%|ββββββ | 402/782 [03:41<03:33, 1.78it/s]
52%|ββββββ | 403/782 [03:41<03:23, 1.86it/s]
52%|ββββββ | 404/782 [03:42<03:23, 1.86it/s]
52%|ββββββ | 405/782 [03:42<03:30, 1.79it/s]
52%|ββββββ | 406/782 [03:43<03:20, 1.87it/s]
52%|ββββββ | 407/782 [03:43<03:19, 1.88it/s]
52%|ββββββ | 408/782 [03:44<03:23, 1.83it/s]
52%|ββββββ | 409/782 [03:44<03:17, 1.89it/s]
52%|ββββββ | 410/782 [03:45<03:16, 1.89it/s]
53%|ββββββ | 411/782 [03:46<03:20, 1.85it/s]
53%|ββββββ | 412/782 [03:46<03:22, 1.83it/s]
53%|ββββββ | 413/782 [03:47<03:17, 1.87it/s]
53%|ββββββ | 414/782 [03:47<03:14, 1.89it/s]
53%|ββββββ | 415/782 [03:48<03:17, 1.86it/s]
53%|ββββββ | 416/782 [03:48<03:18, 1.84it/s]
53%|ββββββ | 417/782 [03:49<03:14, 1.88it/s]
53%|ββββββ | 418/782 [03:49<03:21, 1.80it/s]
54%|ββββββ | 419/782 [03:50<03:26, 1.76it/s]
54%|ββββββ | 420/782 [03:50<03:18, 1.82it/s]
54%|ββββββ | 421/782 [03:51<03:16, 1.84it/s]
54%|ββββββ | 422/782 [03:51<03:13, 1.86it/s]
54%|ββββββ | 423/782 [03:52<03:20, 1.79it/s]
54%|ββββββ | 424/782 [03:53<03:14, 1.84it/s]
54%|ββββββ | 425/782 [03:53<03:10, 1.88it/s]
54%|ββββββ | 426/782 [03:54<03:13, 1.84it/s]
55%|ββββββ | 427/782 [03:54<03:08, 1.88it/s]
55%|ββββββ | 428/782 [03:55<03:08, 1.88it/s]
55%|ββββββ | 429/782 [03:55<03:06, 1.89it/s]
55%|ββββββ | 430/782 [03:56<03:08, 1.87it/s]
55%|ββββββ | 431/782 [03:56<03:03, 1.91it/s]
55%|ββββββ | 432/782 [03:57<03:07, 1.87it/s]
55%|ββββββ | 433/782 [03:57<03:11, 1.82it/s]
55%|ββββββ | 434/782 [03:58<03:05, 1.88it/s]
56%|ββββββ | 435/782 [03:58<03:06, 1.86it/s]
56%|ββββββ | 436/782 [03:59<03:14, 1.78it/s]
56%|ββββββ | 437/782 [04:00<03:06, 1.85it/s]
56%|ββββββ | 438/782 [04:00<03:04, 1.86it/s]
56%|ββββββ | 439/782 [04:01<03:04, 1.86it/s]
56%|ββββββ | 440/782 [04:01<03:00, 1.90it/s]
56%|ββββββ | 441/782 [04:02<02:59, 1.90it/s]
57%|ββββββ | 442/782 [04:02<03:02, 1.86it/s]
57%|ββββββ | 443/782 [04:03<02:58, 1.90it/s]
57%|ββββββ | 444/782 [04:03<02:59, 1.89it/s]
57%|ββββββ | 445/782 [04:04<03:03, 1.83it/s]
57%|ββββββ | 446/782 [04:04<02:53, 1.94it/s]
57%|ββββββ | 447/782 [04:05<02:56, 1.90it/s]
57%|ββββββ | 448/782 [04:05<03:00, 1.85it/s]
57%|ββββββ | 449/782 [04:06<03:00, 1.85it/s]
58%|ββββββ | 450/782 [04:06<02:56, 1.88it/s]
{'loss': 0.0536, 'grad_norm': 0.3546956777572632, 'learning_rate': 4.2583120204603584e-05, 'epoch': 0.58}
58%|ββββββ | 450/782 [04:07<02:56, 1.88it/s]
58%|ββββββ | 451/782 [04:07<02:53, 1.90it/s]
58%|ββββββ | 452/782 [04:08<02:55, 1.88it/s]
58%|ββββββ | 453/782 [04:08<02:49, 1.94it/s]
58%|ββββββ | 454/782 [04:09<02:47, 1.96it/s]
58%|ββββββ | 455/782 [04:09<02:48, 1.94it/s]
58%|ββββββ | 456/782 [04:10<02:47, 1.95it/s]
58%|ββββββ | 457/782 [04:10<02:46, 1.95it/s]
59%|ββββββ | 458/782 [04:11<02:52, 1.88it/s]
59%|ββββββ | 459/782 [04:11<02:56, 1.83it/s]
59%|ββββββ | 460/782 [04:12<02:51, 1.88it/s]
59%|ββββββ | 461/782 [04:12<02:52, 1.86it/s]
59%|ββββββ | 462/782 [04:13<02:53, 1.85it/s]
59%|ββββββ | 463/782 [04:13<02:51, 1.86it/s]
59%|ββββββ | 464/782 [04:14<02:51, 1.85it/s]
59%|ββββββ | 465/782 [04:14<02:47, 1.90it/s]
60%|ββββββ | 466/782 [04:15<02:43, 1.94it/s]
60%|ββββββ | 467/782 [04:15<02:43, 1.93it/s]
60%|ββββββ | 468/782 [04:16<02:49, 1.85it/s]
60%|ββββββ | 469/782 [04:16<02:40, 1.95it/s]
60%|ββββββ | 470/782 [04:17<02:43, 1.91it/s]
60%|ββββββ | 471/782 [04:18<02:45, 1.88it/s]
60%|ββββββ | 472/782 [04:18<02:41, 1.92it/s]
60%|ββββββ | 473/782 [04:19<02:42, 1.90it/s]
61%|ββββββ | 474/782 [04:19<02:45, 1.86it/s]
61%|ββββββ | 475/782 [04:20<02:39, 1.92it/s]
61%|ββββββ | 476/782 [04:20<02:40, 1.91it/s]
61%|ββββββ | 477/782 [04:21<02:40, 1.90it/s]
61%|ββββββ | 478/782 [04:21<02:44, 1.85it/s]
61%|βββββββ | 479/782 [04:22<02:41, 1.87it/s]
61%|βββββββ | 480/782 [04:22<02:37, 1.91it/s]
62%|βββββββ | 481/782 [04:23<02:41, 1.86it/s]
62%|βββββββ | 482/782 [04:23<02:40, 1.87it/s]
62%|βββββββ | 483/782 [04:24<02:37, 1.90it/s]
62%|βββββββ | 484/782 [04:24<02:39, 1.87it/s]
62%|βββββββ | 485/782 [04:25<02:42, 1.83it/s]
62%|βββββββ | 486/782 [04:25<02:34, 1.92it/s]
62%|βββββββ | 487/782 [04:26<02:35, 1.90it/s]
62%|βββββββ | 488/782 [04:27<02:47, 1.76it/s]
63%|βββββββ | 489/782 [04:27<02:38, 1.85it/s]
63%|βββββββ | 490/782 [04:28<02:34, 1.89it/s]
63%|βββββββ | 491/782 [04:28<02:33, 1.90it/s]
63%|βββββββ | 492/782 [04:29<02:34, 1.88it/s]
63%|βββββββ | 493/782 [04:29<02:30, 1.91it/s]
63%|βββββββ | 494/782 [04:30<02:30, 1.91it/s]
63%|βββββββ | 495/782 [04:30<02:30, 1.90it/s]
63%|βββββββ | 496/782 [04:31<02:30, 1.90it/s]
64%|βββββββ | 497/782 [04:31<02:28, 1.93it/s]
64%|βββββββ | 498/782 [04:32<02:33, 1.85it/s]
64%|βββββββ | 499/782 [04:32<02:33, 1.84it/s]
64%|βββββββ | 500/782 [04:33<02:27, 1.91it/s]
{'loss': 0.0399, 'grad_norm': 0.542077362537384, 'learning_rate': 3.6189258312020464e-05, 'epoch': 0.64}
64%|βββββββ | 500/782 [04:33<02:27, 1.91it/s]
64%|βββββββ | 501/782 [04:33<02:29, 1.88it/s]
64%|βββββββ | 502/782 [04:34<02:29, 1.88it/s]
64%|βββββββ | 503/782 [04:34<02:24, 1.93it/s]
64%|βββββββ | 504/782 [04:35<02:27, 1.88it/s]
65%|βββββββ | 505/782 [04:36<02:26, 1.89it/s]
65%|βββββββ | 506/782 [04:36<02:23, 1.92it/s]
65%|βββββββ | 507/782 [04:37<02:25, 1.88it/s]
65%|βββββββ | 508/782 [04:37<02:26, 1.87it/s]
65%|βββββββ | 509/782 [04:38<02:21, 1.93it/s]
65%|βββββββ | 510/782 [04:38<02:22, 1.91it/s]
65%|βββββββ | 511/782 [04:39<02:23, 1.89it/s]
65%|βββββββ | 512/782 [04:39<02:27, 1.83it/s]
66%|βββββββ | 513/782 [04:40<02:25, 1.85it/s]
66%|βββββββ | 514/782 [04:40<02:21, 1.90it/s]
66%|βββββββ | 515/782 [04:41<02:23, 1.86it/s]
66%|βββββββ | 516/782 [04:41<02:27, 1.80it/s]
66%|βββββββ | 517/782 [04:42<02:20, 1.88it/s]
66%|βββββββ | 518/782 [04:43<02:21, 1.86it/s]
66%|βββββββ | 519/782 [04:43<02:27, 1.78it/s]
66%|βββββββ | 520/782 [04:44<02:20, 1.87it/s]
67%|βββββββ | 521/782 [04:44<02:20, 1.86it/s]
67%|βββββββ | 522/782 [04:45<02:21, 1.84it/s]
67%|βββββββ | 523/782 [04:45<02:15, 1.91it/s]
67%|βββββββ | 524/782 [04:46<02:15, 1.90it/s]
67%|βββββββ | 525/782 [04:46<02:17, 1.86it/s]
67%|βββββββ | 526/782 [04:47<02:11, 1.94it/s]
67%|βββββββ | 527/782 [04:47<02:12, 1.92it/s]
68%|βββββββ | 528/782 [04:48<02:16, 1.86it/s]
68%|βββββββ | 529/782 [04:48<02:10, 1.94it/s]
68%|βββββββ | 530/782 [04:49<02:09, 1.95it/s]
68%|βββββββ | 531/782 [04:49<02:18, 1.81it/s]
68%|βββββββ | 532/782 [04:50<02:18, 1.80it/s]
68%|βββββββ | 533/782 [04:51<02:13, 1.86it/s]
68%|βββββββ | 534/782 [04:51<02:11, 1.89it/s]
68%|βββββββ | 535/782 [04:52<02:12, 1.87it/s]
69%|βββββββ | 536/782 [04:52<02:15, 1.81it/s]
69%|βββββββ | 537/782 [04:53<02:09, 1.89it/s]
69%|βββββββ | 538/782 [04:53<02:08, 1.89it/s]
69%|βββββββ | 539/782 [04:54<02:12, 1.84it/s]
69%|βββββββ | 540/782 [04:54<02:08, 1.88it/s]
69%|βββββββ | 541/782 [04:55<02:06, 1.91it/s]
69%|βββββββ | 542/782 [04:55<02:07, 1.89it/s]
69%|βββββββ | 543/782 [04:56<02:04, 1.92it/s]
70%|βββββββ | 544/782 [04:56<02:02, 1.94it/s]
70%|βββββββ | 545/782 [04:57<02:03, 1.92it/s]
70%|βββββββ | 546/782 [04:57<02:05, 1.89it/s]
70%|βββββββ | 547/782 [04:58<02:04, 1.88it/s]
70%|βββββββ | 548/782 [04:58<02:04, 1.89it/s]
70%|βββββββ | 549/782 [04:59<02:03, 1.88it/s]
70%|βββββββ | 550/782 [05:00<02:02, 1.89it/s]
{'loss': 0.0336, 'grad_norm': 0.38286203145980835, 'learning_rate': 2.9795396419437344e-05, 'epoch': 0.7}
70%|βββββββ | 550/782 [05:00<02:02, 1.89it/s]
70%|βββββββ | 551/782 [05:00<02:03, 1.87it/s]
71%|βββββββ | 552/782 [05:01<02:03, 1.86it/s]
71%|βββββββ | 553/782 [05:01<02:03, 1.85it/s]
71%|βββββββ | 554/782 [05:02<02:02, 1.86it/s]
71%|βββββββ | 555/782 [05:02<02:01, 1.87it/s]
71%|βββββββ | 556/782 [05:03<02:02, 1.85it/s]
71%|βββββββ | 557/782 [05:03<02:08, 1.75it/s]
71%|ββββββββ | 558/782 [05:04<02:09, 1.73it/s]
71%|ββββββββ | 559/782 [05:05<02:08, 1.74it/s]
72%|ββββββββ | 560/782 [05:05<02:04, 1.79it/s]
72%|ββββββββ | 561/782 [05:06<02:09, 1.71it/s]
72%|ββββββββ | 562/782 [05:06<02:13, 1.65it/s]
72%|ββββββββ | 563/782 [05:07<02:19, 1.57it/s]
72%|ββββββββ | 564/782 [05:08<02:16, 1.59it/s]
72%|ββββββββ | 565/782 [05:08<02:17, 1.58it/s]
72%|ββββββββ | 566/782 [05:09<02:19, 1.55it/s]
73%|ββββββββ | 567/782 [05:10<02:18, 1.55it/s]
73%|ββββββββ | 568/782 [05:10<02:15, 1.58it/s]
73%|ββββββββ | 569/782 [05:11<02:20, 1.51it/s]
73%|ββββββββ | 570/782 [05:12<02:16, 1.55it/s]
73%|ββββββββ | 571/782 [05:12<02:16, 1.54it/s]
73%|ββββββββ | 572/782 [05:13<02:16, 1.54it/s]
73%|ββββββββ | 573/782 [05:14<02:19, 1.50it/s]
73%|ββββββββ | 574/782 [05:14<02:16, 1.53it/s]
74%|ββββββββ | 575/782 [05:15<02:15, 1.52it/s]
74%|ββββββββ | 576/782 [05:16<02:16, 1.50it/s]
74%|ββββββββ | 577/782 [05:16<02:15, 1.52it/s]
74%|ββββββββ | 578/782 [05:17<02:18, 1.47it/s]
74%|ββββββββ | 579/782 [05:18<02:12, 1.53it/s]
74%|ββββββββ | 580/782 [05:18<02:12, 1.53it/s]
74%|ββββββββ | 581/782 [05:19<02:15, 1.48it/s]
74%|ββββββββ | 582/782 [05:20<02:10, 1.53it/s]
75%|ββββββββ | 583/782 [05:20<02:11, 1.51it/s]
75%|ββββββββ | 584/782 [05:21<02:10, 1.51it/s]
75%|ββββββββ | 585/782 [05:22<02:07, 1.55it/s]
75%|ββββββββ | 586/782 [05:22<02:08, 1.53it/s]
75%|ββββββββ | 587/782 [05:23<02:11, 1.49it/s]
75%|ββββββββ | 588/782 [05:24<02:07, 1.52it/s]
75%|ββββββββ | 589/782 [05:24<02:09, 1.49it/s]
75%|ββββββββ | 590/782 [05:25<02:09, 1.49it/s]
76%|ββββββββ | 591/782 [05:26<02:05, 1.52it/s]
76%|ββββββββ | 592/782 [05:26<02:03, 1.54it/s]
76%|ββββββββ | 593/782 [05:27<02:01, 1.55it/s]
76%|ββββββββ | 594/782 [05:27<02:02, 1.53it/s]
76%|ββββββββ | 595/782 [05:28<02:01, 1.54it/s]
76%|ββββββββ | 596/782 [05:29<02:00, 1.54it/s]
76%|ββββββββ | 597/782 [05:29<02:01, 1.52it/s]
76%|ββββββββ | 598/782 [05:30<02:00, 1.53it/s]
77%|ββββββββ | 599/782 [05:31<01:56, 1.57it/s]
77%|ββββββββ | 600/782 [05:31<01:57, 1.54it/s]
{'loss': 0.0286, 'grad_norm': 0.4268437325954437, 'learning_rate': 2.340153452685422e-05, 'epoch': 0.77}
77%|ββββββββ | 600/782 [05:31<01:57, 1.54it/s]
77%|ββββββββ | 601/782 [05:32<01:59, 1.52it/s]
77%|ββββββββ | 602/782 [05:33<01:56, 1.55it/s]
77%|ββββββββ | 603/782 [05:33<01:58, 1.51it/s]
77%|ββββββββ | 604/782 [05:34<02:01, 1.47it/s]
77%|ββββββββ | 605/782 [05:35<01:58, 1.49it/s]
77%|ββββββββ | 606/782 [05:35<01:55, 1.52it/s]
78%|ββββββββ | 607/782 [05:36<01:55, 1.51it/s]
78%|ββββββββ | 608/782 [05:37<01:50, 1.57it/s]
78%|ββββββββ | 609/782 [05:37<01:52, 1.54it/s]
78%|ββββββββ | 610/782 [05:38<01:57, 1.46it/s]
78%|ββββββββ | 611/782 [05:39<01:53, 1.51it/s]
78%|ββββββββ | 612/782 [05:39<01:55, 1.48it/s]
78%|ββββββββ | 613/782 [05:40<01:51, 1.52it/s]
79%|ββββββββ | 614/782 [05:41<01:48, 1.54it/s]
79%|ββββββββ | 615/782 [05:41<01:48, 1.54it/s]
79%|ββββββββ | 616/782 [05:42<01:51, 1.49it/s]
79%|ββββββββ | 617/782 [05:43<01:48, 1.53it/s]
79%|ββββββββ | 618/782 [05:43<01:50, 1.48it/s]
79%|ββββββββ | 619/782 [05:44<01:50, 1.48it/s]
79%|ββββββββ | 620/782 [05:45<01:48, 1.49it/s]
79%|ββββββββ | 621/782 [05:45<01:48, 1.48it/s]
80%|ββββββββ | 622/782 [05:46<01:47, 1.49it/s]
80%|ββββββββ | 623/782 [05:47<01:43, 1.54it/s]
80%|ββββββββ | 624/782 [05:47<01:42, 1.53it/s]
80%|ββββββββ | 625/782 [05:48<01:42, 1.53it/s]
80%|ββββββββ | 626/782 [05:49<01:42, 1.52it/s]
80%|ββββββββ | 627/782 [05:49<01:43, 1.50it/s]
80%|ββββββββ | 628/782 [05:50<01:42, 1.51it/s]
80%|ββββββββ | 629/782 [05:51<01:38, 1.56it/s]
81%|ββββββββ | 630/782 [05:51<01:43, 1.46it/s]
81%|ββββββββ | 631/782 [05:52<01:41, 1.48it/s]
81%|ββββββββ | 632/782 [05:52<01:33, 1.61it/s]
81%|ββββββββ | 633/782 [05:53<01:28, 1.69it/s]
81%|ββββββββ | 634/782 [05:54<01:27, 1.69it/s]
81%|ββββββββ | 635/782 [05:54<01:22, 1.78it/s]
81%|βββββββββ | 636/782 [05:55<01:22, 1.77it/s]
81%|βββββββββ | 637/782 [05:55<01:23, 1.73it/s]
82%|βββββββββ | 638/782 [05:56<01:21, 1.77it/s]
82%|βββββββββ | 639/782 [05:56<01:18, 1.82it/s]
82%|βββββββββ | 640/782 [05:57<01:18, 1.81it/s]
82%|βββββββββ | 641/782 [05:57<01:17, 1.83it/s]
82%|βββββββββ | 642/782 [05:58<01:14, 1.87it/s]
82%|βββββββββ | 643/782 [05:58<01:15, 1.85it/s]
82%|βββββββββ | 644/782 [05:59<01:14, 1.86it/s]
82%|βββββββββ | 645/782 [06:00<01:13, 1.85it/s]
83%|βββββββββ | 646/782 [06:00<01:13, 1.85it/s]
83%|βββββββββ | 647/782 [06:01<01:14, 1.82it/s]
83%|βββββββββ | 648/782 [06:01<01:11, 1.86it/s]
83%|βββββββββ | 649/782 [06:02<01:10, 1.88it/s]
83%|βββββββββ | 650/782 [06:02<01:11, 1.85it/s]
{'loss': 0.0237, 'grad_norm': 0.49761509895324707, 'learning_rate': 1.70076726342711e-05, 'epoch': 0.83}
83%|βββββββββ | 650/782 [06:02<01:11, 1.85it/s]
83%|βββββββββ | 651/782 [06:03<01:10, 1.87it/s]
83%|βββββββββ | 652/782 [06:03<01:08, 1.89it/s]
84%|βββββββββ | 653/782 [06:04<01:11, 1.81it/s]
84%|βββββββββ | 654/782 [06:04<01:10, 1.83it/s]
84%|βββββββββ | 655/782 [06:05<01:07, 1.88it/s]
84%|βββββββββ | 656/782 [06:05<01:09, 1.83it/s]
84%|βββββββββ | 657/782 [06:06<01:08, 1.81it/s]
84%|βββββββββ | 658/782 [06:07<01:06, 1.88it/s]
84%|βββββββββ | 659/782 [06:07<01:06, 1.85it/s]
84%|βββββββββ | 660/782 [06:08<01:08, 1.79it/s]
85%|βββββββββ | 661/782 [06:08<01:04, 1.87it/s]
85%|βββββββββ | 662/782 [06:09<01:04, 1.86it/s]
85%|βββββββββ | 663/782 [06:09<01:05, 1.81it/s]
85%|βββββββββ | 664/782 [06:10<01:03, 1.86it/s]
85%|βββββββββ | 665/782 [06:10<01:03, 1.86it/s]
85%|βββββββββ | 666/782 [06:11<01:05, 1.78it/s]
85%|βββββββββ | 667/782 [06:11<01:01, 1.88it/s]
85%|βββββββββ | 668/782 [06:12<01:00, 1.88it/s]
86%|βββββββββ | 669/782 [06:12<01:00, 1.87it/s]
86%|βββββββββ | 670/782 [06:13<00:59, 1.88it/s]
86%|βββββββββ | 671/782 [06:14<00:58, 1.90it/s]
86%|βββββββββ | 672/782 [06:14<00:57, 1.92it/s]
86%|βββββββββ | 673/782 [06:15<00:58, 1.88it/s]
86%|βββββββββ | 674/782 [06:15<00:56, 1.90it/s]
86%|βββββββββ | 675/782 [06:16<00:55, 1.92it/s]
86%|βββββββββ | 676/782 [06:16<00:58, 1.82it/s]
87%|βββββββββ | 677/782 [06:17<00:58, 1.79it/s]
87%|βββββββββ | 678/782 [06:17<00:55, 1.88it/s]
87%|βββββββββ | 679/782 [06:18<00:55, 1.86it/s]
87%|βββββββββ | 680/782 [06:18<00:57, 1.77it/s]
87%|βββββββββ | 681/782 [06:19<00:53, 1.88it/s]
87%|βββββββββ | 682/782 [06:19<00:54, 1.84it/s]
87%|βββββββββ | 683/782 [06:20<00:54, 1.81it/s]
87%|βββββββββ | 684/782 [06:21<00:51, 1.90it/s]
88%|βββββββββ | 685/782 [06:21<00:52, 1.85it/s]
88%|βββββββββ | 686/782 [06:22<00:52, 1.83it/s]
88%|βββββββββ | 687/782 [06:22<00:50, 1.89it/s]
88%|βββββββββ | 688/782 [06:23<00:49, 1.88it/s]
88%|βββββββββ | 689/782 [06:23<00:49, 1.89it/s]
88%|βββββββββ | 690/782 [06:24<00:49, 1.87it/s]
88%|βββββββββ | 691/782 [06:24<00:48, 1.89it/s]
88%|βββββββββ | 692/782 [06:25<00:47, 1.90it/s]
89%|βββββββββ | 693/782 [06:25<00:46, 1.90it/s]
89%|βββββββββ | 694/782 [06:26<00:46, 1.90it/s]
89%|βββββββββ | 695/782 [06:26<00:47, 1.84it/s]
89%|βββββββββ | 696/782 [06:27<00:45, 1.89it/s]
89%|βββββββββ | 697/782 [06:27<00:45, 1.87it/s]
89%|βββββββββ | 698/782 [06:28<00:44, 1.88it/s]
89%|βββββββββ | 699/782 [06:29<00:44, 1.85it/s]
90%|βββββββββ | 700/782 [06:29<00:43, 1.88it/s]
{'loss': 0.0215, 'grad_norm': 0.2186804562807083, 'learning_rate': 1.061381074168798e-05, 'epoch': 0.9}
90%|βββββββββ | 700/782 [06:29<00:43, 1.88it/s]
90%|βββββββββ | 701/782 [06:30<00:43, 1.87it/s]
90%|βββββββββ | 702/782 [06:30<00:44, 1.80it/s]
90%|βββββββββ | 703/782 [06:31<00:44, 1.79it/s]
90%|βββββββββ | 704/782 [06:31<00:41, 1.86it/s]
90%|βββββββββ | 705/782 [06:32<00:41, 1.84it/s]
90%|βββββββββ | 706/782 [06:32<00:40, 1.87it/s]
90%|βββββββββ | 707/782 [06:33<00:39, 1.89it/s]
91%|βββββββββ | 708/782 [06:33<00:39, 1.89it/s]
91%|βββββββββ | 709/782 [06:34<00:38, 1.89it/s]
91%|βββββββββ | 710/782 [06:34<00:39, 1.83it/s]
91%|βββββββββ | 711/782 [06:35<00:38, 1.86it/s]
91%|βββββββββ | 712/782 [06:36<00:37, 1.88it/s]
91%|βββββββββ | 713/782 [06:36<00:37, 1.84it/s]
91%|ββββββββββ| 714/782 [06:37<00:37, 1.82it/s]
91%|ββββββββββ| 715/782 [06:37<00:35, 1.87it/s]
92%|ββββββββββ| 716/782 [06:38<00:35, 1.86it/s]
92%|ββββββββββ| 717/782 [06:38<00:35, 1.84it/s]
92%|ββββββββββ| 718/782 [06:39<00:34, 1.87it/s]
92%|ββββββββββ| 719/782 [06:39<00:33, 1.86it/s]
92%|ββββββββββ| 720/782 [06:40<00:33, 1.87it/s]
92%|ββββββββββ| 721/782 [06:40<00:32, 1.87it/s]
92%|ββββββββββ| 722/782 [06:41<00:31, 1.91it/s]
92%|ββββββββββ| 723/782 [06:41<00:30, 1.91it/s]
93%|ββββββββββ| 724/782 [06:42<00:31, 1.84it/s]
93%|ββββββββββ| 725/782 [06:43<00:31, 1.82it/s]
93%|ββββββββββ| 726/782 [06:43<00:31, 1.77it/s]
93%|ββββββββββ| 727/782 [06:44<00:30, 1.80it/s]
93%|ββββββββββ| 728/782 [06:44<00:29, 1.81it/s]
93%|ββββββββββ| 729/782 [06:45<00:29, 1.82it/s]
93%|ββββββββββ| 730/782 [06:45<00:28, 1.85it/s]
93%|ββββββββββ| 731/782 [06:46<00:27, 1.86it/s]
94%|ββββββββββ| 732/782 [06:46<00:27, 1.85it/s]
94%|ββββββββββ| 733/782 [06:47<00:26, 1.88it/s]
94%|ββββββββββ| 734/782 [06:47<00:25, 1.86it/s]
94%|ββββββββββ| 735/782 [06:48<00:25, 1.85it/s]
94%|ββββββββββ| 736/782 [06:49<00:24, 1.88it/s]
94%|ββββββββββ| 737/782 [06:49<00:23, 1.89it/s]
94%|ββββββββββ| 738/782 [06:50<00:23, 1.88it/s]
95%|ββββββββββ| 739/782 [06:50<00:22, 1.90it/s]
95%|ββββββββββ| 740/782 [06:51<00:22, 1.90it/s]
95%|ββββββββββ| 741/782 [06:51<00:22, 1.85it/s]
95%|ββββββββββ| 742/782 [06:52<00:21, 1.89it/s]
95%|ββββββββββ| 743/782 [06:52<00:20, 1.91it/s]
95%|ββββββββββ| 744/782 [06:53<00:20, 1.84it/s]
95%|ββββββββββ| 745/782 [06:53<00:19, 1.88it/s]
95%|ββββββββββ| 746/782 [06:54<00:19, 1.88it/s]
96%|ββββββββββ| 747/782 [06:54<00:19, 1.83it/s]
96%|ββββββββββ| 748/782 [06:55<00:18, 1.82it/s]
96%|ββββββββββ| 749/782 [06:55<00:17, 1.86it/s]
96%|ββββββββββ| 750/782 [06:56<00:17, 1.81it/s]
{'loss': 0.0212, 'grad_norm': 0.354276180267334, 'learning_rate': 4.219948849104859e-06, 'epoch': 0.96}
96%|ββββββββββ| 750/782 [06:56<00:17, 1.81it/s]
96%|ββββββββββ| 751/782 [06:57<00:17, 1.73it/s]
96%|ββββββββββ| 752/782 [06:57<00:16, 1.82it/s]
96%|ββββββββββ| 753/782 [06:58<00:15, 1.83it/s]
96%|ββββββββββ| 754/782 [06:58<00:15, 1.80it/s]
97%|ββββββββββ| 755/782 [06:59<00:14, 1.82it/s]
97%|ββββββββββ| 756/782 [06:59<00:14, 1.83it/s]
97%|ββββββββββ| 757/782 [07:00<00:13, 1.82it/s]
97%|ββββββββββ| 758/782 [07:00<00:13, 1.80it/s]
97%|ββββββββββ| 759/782 [07:01<00:12, 1.83it/s]
97%|ββββββββββ| 760/782 [07:02<00:11, 1.89it/s]
97%|ββββββββββ| 761/782 [07:02<00:11, 1.90it/s]
97%|ββββββββββ| 762/782 [07:03<00:10, 1.90it/s]
98%|ββββββββββ| 763/782 [07:03<00:09, 1.91it/s]
98%|ββββββββββ| 764/782 [07:04<00:09, 1.89it/s]
98%|ββββββββββ| 765/782 [07:04<00:09, 1.88it/s]
98%|ββββββββββ| 766/782 [07:05<00:08, 1.86it/s]
98%|ββββββββββ| 767/782 [07:05<00:08, 1.85it/s]
98%|ββββββββββ| 768/782 [07:06<00:07, 1.87it/s]
98%|ββββββββββ| 769/782 [07:06<00:06, 1.87it/s]
98%|ββββββββββ| 770/782 [07:07<00:06, 1.86it/s]
99%|ββββββββββ| 771/782 [07:07<00:05, 1.90it/s]
99%|ββββββββββ| 772/782 [07:08<00:05, 1.92it/s]
99%|ββββββββββ| 773/782 [07:08<00:04, 1.85it/s]
99%|ββββββββββ| 774/782 [07:09<00:04, 1.85it/s]
99%|ββββββββββ| 775/782 [07:09<00:03, 1.91it/s]
99%|ββββββββββ| 776/782 [07:10<00:03, 1.90it/s]
99%|ββββββββββ| 777/782 [07:11<00:02, 1.90it/s]
99%|ββββββββββ| 778/782 [07:11<00:02, 1.92it/s]
100%|ββββββββββ| 779/782 [07:12<00:01, 1.89it/s]
100%|ββββββββββ| 780/782 [07:12<00:01, 1.90it/s]
100%|ββββββββββ| 781/782 [07:13<00:00, 1.94it/s]
100%|ββββββββββ| 782/782 [07:13<00:00, 1.92it/s]
{'train_runtime': 434.6484, 'train_samples_per_second': 230.071, 'train_steps_per_second': 1.799, 'train_loss': 6.048197149010876, 'epoch': 1.0}
100%|ββββββββββ| 782/782 [07:13<00:00, 1.92it/s]
100%|ββββββββββ| 782/782 [07:13<00:00, 1.80it/s]
model.safetensors: 0%| | 0.00/3.13G [00:00<?, ?B/s]
spiece.model: 0%| | 0.00/792k [00:00<?, ?B/s][A
Upload 3 LFS files: 0%| | 0/3 [00:00<?, ?it/s][A[A
training_args.bin: 0%| | 0.00/5.37k [00:00<?, ?B/s][A[A[A
training_args.bin: 100%|ββββββββββ| 5.37k/5.37k [00:00<00:00, 59.3kB/s]
model.safetensors: 0%| | 852k/3.13G [00:00<07:00, 7.45MB/s]
spiece.model: 100%|ββββββββββ| 792k/792k [00:00<00:00, 5.49MB/s]
model.safetensors: 1%| | 16.0M/3.13G [00:00<01:09, 44.7MB/s]
model.safetensors: 1%| | 32.0M/3.13G [00:00<00:58, 52.7MB/s]
model.safetensors: 2%|β | 48.0M/3.13G [00:00<00:56, 54.4MB/s]
model.safetensors: 2%|β | 64.0M/3.13G [00:01<00:53, 56.9MB/s]
model.safetensors: 3%|β | 80.0M/3.13G [00:01<00:56, 53.8MB/s]
model.safetensors: 3%|β | 96.0M/3.13G [00:01<00:56, 54.1MB/s]
model.safetensors: 4%|β | 112M/3.13G [00:02<00:53, 56.6MB/s]
model.safetensors: 4%|β | 128M/3.13G [00:02<00:58, 51.2MB/s]
model.safetensors: 5%|β | 144M/3.13G [00:02<00:56, 53.1MB/s]
model.safetensors: 5%|β | 160M/3.13G [00:02<00:52, 56.1MB/s]
model.safetensors: 6%|β | 176M/3.13G [00:03<00:55, 53.0MB/s]
model.safetensors: 6%|β | 192M/3.13G [00:03<00:56, 52.3MB/s]
model.safetensors: 7%|β | 208M/3.13G [00:03<00:53, 55.0MB/s]
model.safetensors: 7%|β | 224M/3.13G [00:04<00:51, 56.2MB/s]
model.safetensors: 8%|β | 240M/3.13G [00:04<00:50, 57.4MB/s]
model.safetensors: 8%|β | 256M/3.13G [00:04<00:49, 58.6MB/s]
model.safetensors: 9%|β | 272M/3.13G [00:04<00:48, 58.7MB/s]
model.safetensors: 9%|β | 288M/3.13G [00:05<00:48, 59.0MB/s]
model.safetensors: 10%|β | 304M/3.13G [00:05<00:47, 59.2MB/s]
model.safetensors: 10%|β | 320M/3.13G [00:05<00:47, 59.1MB/s]
model.safetensors: 11%|β | 336M/3.13G [00:06<00:45, 61.0MB/s]
model.safetensors: 11%|β | 352M/3.13G [00:06<00:46, 60.3MB/s]
model.safetensors: 12%|ββ | 368M/3.13G [00:06<00:43, 64.0MB/s]
model.safetensors: 12%|ββ | 384M/3.13G [00:06<00:44, 62.3MB/s]
model.safetensors: 13%|ββ | 400M/3.13G [00:07<00:44, 62.1MB/s]
model.safetensors: 13%|ββ | 416M/3.13G [00:07<00:44, 61.6MB/s]
model.safetensors: 14%|ββ | 432M/3.13G [00:07<00:47, 56.5MB/s]
model.safetensors: 14%|ββ | 448M/3.13G [00:07<00:46, 57.7MB/s]
model.safetensors: 15%|ββ | 464M/3.13G [00:08<00:43, 62.0MB/s]
model.safetensors: 15%|ββ | 480M/3.13G [00:08<00:43, 60.9MB/s]
model.safetensors: 16%|ββ | 496M/3.13G [00:08<00:59, 44.5MB/s]
model.safetensors: 16%|ββ | 512M/3.13G [00:09<00:54, 48.0MB/s]
model.safetensors: 17%|ββ | 528M/3.13G [00:09<00:49, 53.0MB/s]
model.safetensors: 17%|ββ | 544M/3.13G [00:09<00:48, 53.8MB/s]
model.safetensors: 18%|ββ | 560M/3.13G [00:10<01:09, 36.8MB/s]
model.safetensors: 18%|ββ | 576M/3.13G [00:10<01:02, 40.8MB/s]
model.safetensors: 19%|ββ | 592M/3.13G [00:11<00:56, 45.1MB/s]
model.safetensors: 19%|ββ | 608M/3.13G [00:11<00:50, 50.1MB/s]
model.safetensors: 20%|ββ | 624M/3.13G [00:11<00:51, 49.1MB/s]
model.safetensors: 20%|ββ | 640M/3.13G [00:11<00:47, 52.2MB/s]
model.safetensors: 21%|ββ | 656M/3.13G [00:12<00:49, 49.7MB/s]
model.safetensors: 21%|βββ | 672M/3.13G [00:12<00:50, 48.9MB/s]
model.safetensors: 22%|βββ | 688M/3.13G [00:12<00:46, 52.7MB/s]
model.safetensors: 22%|βββ | 704M/3.13G [00:13<00:45, 53.8MB/s]
model.safetensors: 23%|βββ | 720M/3.13G [00:13<00:43, 55.1MB/s]
model.safetensors: 23%|βββ | 736M/3.13G [00:17<03:41, 10.8MB/s]
model.safetensors: 24%|βββ | 745M/3.13G [00:17<03:02, 13.1MB/s]
model.safetensors: 24%|βββ | 752M/3.13G [00:17<02:44, 14.5MB/s]
model.safetensors: 25%|βββ | 768M/3.13G [00:18<02:04, 19.0MB/s]
model.safetensors: 25%|βββ | 784M/3.13G [00:18<01:34, 24.8MB/s]
model.safetensors: 26%|βββ | 800M/3.13G [00:18<01:14, 31.1MB/s]
model.safetensors: 26%|βββ | 816M/3.13G [00:19<01:02, 37.3MB/s]
model.safetensors: 27%|βββ | 832M/3.13G [00:19<00:57, 39.9MB/s]
model.safetensors: 27%|βββ | 848M/3.13G [00:19<00:50, 45.7MB/s]
model.safetensors: 28%|βββ | 864M/3.13G [00:19<00:43, 52.7MB/s]
model.safetensors: 28%|βββ | 880M/3.13G [00:20<00:42, 53.0MB/s]
model.safetensors: 29%|βββ | 896M/3.13G [00:20<00:40, 55.6MB/s]
model.safetensors: 29%|βββ | 912M/3.13G [00:20<00:38, 57.8MB/s]
model.safetensors: 30%|βββ | 928M/3.13G [00:20<00:37, 58.2MB/s]
model.safetensors: 30%|βββ | 944M/3.13G [00:21<00:35, 62.1MB/s]
model.safetensors: 31%|βββ | 960M/3.13G [00:21<00:34, 62.3MB/s]
model.safetensors: 31%|βββ | 976M/3.13G [00:21<00:35, 60.3MB/s]
model.safetensors: 32%|ββββ | 992M/3.13G [00:22<00:35, 59.6MB/s]
model.safetensors: 32%|ββββ | 1.01G/3.13G [00:22<00:34, 61.8MB/s]
model.safetensors: 33%|ββββ | 1.02G/3.13G [00:22<00:34, 60.4MB/s]
model.safetensors: 33%|ββββ | 1.04G/3.13G [00:22<00:34, 60.9MB/s]
model.safetensors: 34%|ββββ | 1.06G/3.13G [00:22<00:31, 66.4MB/s]
model.safetensors: 34%|ββββ | 1.07G/3.13G [00:23<00:35, 58.3MB/s]
model.safetensors: 35%|ββββ | 1.09G/3.13G [00:23<00:34, 60.0MB/s]
model.safetensors: 35%|ββββ | 1.10G/3.13G [00:23<00:34, 59.1MB/s]
model.safetensors: 36%|ββββ | 1.12G/3.13G [00:24<00:32, 61.0MB/s]
model.safetensors: 36%|ββββ | 1.14G/3.13G [00:24<00:32, 61.5MB/s]
model.safetensors: 37%|ββββ | 1.15G/3.13G [00:24<00:37, 52.4MB/s]
model.safetensors: 37%|ββββ | 1.17G/3.13G [00:25<00:35, 56.1MB/s]
model.safetensors: 38%|ββββ | 1.18G/3.13G [00:25<00:42, 46.2MB/s]
model.safetensors: 38%|ββββ | 1.20G/3.13G [00:25<00:38, 50.1MB/s]
model.safetensors: 39%|ββββ | 1.22G/3.13G [00:26<00:36, 52.8MB/s]
model.safetensors: 39%|ββββ | 1.23G/3.13G [00:26<00:36, 52.0MB/s]
model.safetensors: 40%|ββββ | 1.25G/3.13G [00:26<00:34, 54.1MB/s]
model.safetensors: 40%|ββββ | 1.26G/3.13G [00:26<00:32, 56.6MB/s]
model.safetensors: 41%|ββββ | 1.28G/3.13G [00:27<00:31, 58.0MB/s]
model.safetensors: 41%|βββββ | 1.30G/3.13G [00:27<00:31, 58.3MB/s]
model.safetensors: 42%|βββββ | 1.31G/3.13G [00:27<00:29, 62.2MB/s]
model.safetensors: 42%|βββββ | 1.33G/3.13G [00:27<00:29, 60.6MB/s]
model.safetensors: 43%|βββββ | 1.34G/3.13G [00:28<00:29, 60.2MB/s]
model.safetensors: 43%|βββββ | 1.36G/3.13G [00:28<00:29, 60.4MB/s]
model.safetensors: 44%|βββββ | 1.38G/3.13G [00:29<00:40, 43.0MB/s]
model.safetensors: 44%|βββββ | 1.39G/3.13G [00:29<00:38, 45.5MB/s]
model.safetensors: 45%|βββββ | 1.41G/3.13G [00:29<00:34, 49.6MB/s]
model.safetensors: 45%|βββββ | 1.42G/3.13G [00:29<00:34, 48.9MB/s]
model.safetensors: 46%|βββββ | 1.44G/3.13G [00:30<00:30, 55.2MB/s]
model.safetensors: 46%|βββββ | 1.46G/3.13G [00:30<00:29, 56.8MB/s]
model.safetensors: 47%|βββββ | 1.47G/3.13G [00:30<00:28, 58.1MB/s]
model.safetensors: 47%|βββββ | 1.49G/3.13G [00:30<00:27, 59.7MB/s]
model.safetensors: 48%|βββββ | 1.50G/3.13G [00:31<00:30, 54.2MB/s]
model.safetensors: 49%|βββββ | 1.52G/3.13G [00:31<00:35, 46.1MB/s]
model.safetensors: 49%|βββββ | 1.54G/3.13G [00:32<00:32, 49.7MB/s]
model.safetensors: 50%|βββββ | 1.55G/3.13G [00:32<00:30, 51.6MB/s]
model.safetensors: 50%|βββββ | 1.57G/3.13G [00:32<00:29, 53.3MB/s]
model.safetensors: 51%|βββββ | 1.58G/3.13G [00:32<00:28, 54.2MB/s]
model.safetensors: 51%|βββββ | 1.60G/3.13G [00:33<00:27, 56.4MB/s]
model.safetensors: 52%|ββββββ | 1.62G/3.13G [00:33<00:26, 57.2MB/s]
model.safetensors: 52%|ββββββ | 1.63G/3.13G [00:33<00:26, 56.1MB/s]
model.safetensors: 53%|ββββββ | 1.65G/3.13G [00:33<00:25, 58.6MB/s]
model.safetensors: 53%|ββββββ | 1.66G/3.13G [00:34<00:25, 57.5MB/s]
model.safetensors: 54%|ββββββ | 1.68G/3.13G [00:34<00:24, 59.0MB/s]
model.safetensors: 54%|ββββββ | 1.70G/3.13G [00:34<00:23, 60.8MB/s]
model.safetensors: 55%|ββββββ | 1.71G/3.13G [00:35<00:33, 42.0MB/s]
model.safetensors: 55%|ββββββ | 1.73G/3.13G [00:35<00:34, 41.1MB/s]
model.safetensors: 56%|ββββββ | 1.74G/3.13G [00:36<00:29, 46.4MB/s]
model.safetensors: 56%|ββββββ | 1.76G/3.13G [00:36<00:26, 51.5MB/s]
model.safetensors: 57%|ββββββ | 1.78G/3.13G [00:36<00:24, 54.4MB/s]
model.safetensors: 57%|ββββββ | 1.79G/3.13G [00:36<00:23, 56.6MB/s]
model.safetensors: 58%|ββββββ | 1.81G/3.13G [00:37<00:22, 58.5MB/s]
model.safetensors: 58%|ββββββ | 1.82G/3.13G [00:37<00:21, 60.9MB/s]
model.safetensors: 59%|ββββββ | 1.84G/3.13G [00:37<00:19, 65.2MB/s]
model.safetensors: 59%|ββββββ | 1.86G/3.13G [00:38<00:31, 40.5MB/s]
model.safetensors: 60%|ββββββ | 1.87G/3.13G [00:38<00:27, 45.3MB/s]
model.safetensors: 60%|ββββββ | 1.89G/3.13G [00:38<00:25, 48.9MB/s]
model.safetensors: 61%|ββββββ | 1.90G/3.13G [00:38<00:23, 51.6MB/s]
model.safetensors: 61%|βββββββ | 1.92G/3.13G [00:39<00:22, 53.8MB/s]
model.safetensors: 62%|βββββββ | 1.94G/3.13G [00:39<00:22, 52.4MB/s]
model.safetensors: 62%|βββββββ | 1.95G/3.13G [00:39<00:21, 55.9MB/s]
model.safetensors: 63%|βββββββ | 1.97G/3.13G [00:40<00:20, 57.0MB/s]
model.safetensors: 63%|βββββββ | 1.98G/3.13G [00:40<00:19, 58.5MB/s]
model.safetensors: 64%|βββββββ | 2.00G/3.13G [00:40<00:19, 58.3MB/s]
model.safetensors: 64%|βββββββ | 2.02G/3.13G [00:40<00:19, 58.2MB/s]
model.safetensors: 65%|βββββββ | 2.03G/3.13G [00:41<00:17, 61.9MB/s]
model.safetensors: 65%|βββββββ | 2.05G/3.13G [00:41<00:19, 54.8MB/s]
model.safetensors: 66%|βββββββ | 2.06G/3.13G [00:41<00:18, 56.9MB/s]
model.safetensors: 66%|βββββββ | 2.08G/3.13G [00:42<00:18, 56.4MB/s]
model.safetensors: 67%|βββββββ | 2.10G/3.13G [00:42<00:19, 52.0MB/s]
model.safetensors: 67%|βββββββ | 2.11G/3.13G [00:42<00:19, 53.1MB/s]
model.safetensors: 68%|βββββββ | 2.13G/3.13G [00:42<00:18, 55.1MB/s]
model.safetensors: 68%|βββββββ | 2.14G/3.13G [00:43<00:17, 57.2MB/s]
model.safetensors: 69%|βββββββ | 2.16G/3.13G [00:43<00:21, 45.4MB/s]
model.safetensors: 69%|βββββββ | 2.18G/3.13G [00:44<00:19, 48.5MB/s]
model.safetensors: 70%|βββββββ | 2.19G/3.13G [00:44<00:18, 51.7MB/s]
model.safetensors: 70%|βββββββ | 2.21G/3.13G [00:44<00:16, 55.3MB/s]
model.safetensors: 71%|βββββββ | 2.22G/3.13G [00:44<00:16, 55.1MB/s]
model.safetensors: 72%|ββββββββ | 2.24G/3.13G [00:45<00:15, 56.5MB/s]
model.safetensors: 72%|ββββββββ | 2.26G/3.13G [00:45<00:15, 57.7MB/s]
model.safetensors: 73%|ββββββββ | 2.27G/3.13G [00:45<00:20, 41.8MB/s]
model.safetensors: 73%|ββββββββ | 2.29G/3.13G [00:46<00:22, 37.5MB/s]
model.safetensors: 74%|ββββββββ | 2.30G/3.13G [00:46<00:19, 41.9MB/s]
model.safetensors: 74%|ββββββββ | 2.32G/3.13G [00:47<00:18, 43.9MB/s]
model.safetensors: 75%|ββββββββ | 2.34G/3.13G [00:47<00:16, 47.9MB/s]
model.safetensors: 75%|ββββββββ | 2.35G/3.13G [00:47<00:15, 51.7MB/s]
model.safetensors: 76%|ββββββββ | 2.37G/3.13G [00:48<00:23, 32.6MB/s]
model.safetensors: 76%|ββββββββ | 2.38G/3.13G [00:48<00:20, 37.2MB/s]
model.safetensors: 77%|ββββββββ | 2.40G/3.13G [00:49<00:18, 40.5MB/s]
model.safetensors: 77%|ββββββββ | 2.42G/3.13G [00:49<00:16, 43.5MB/s]
model.safetensors: 78%|ββββββββ | 2.43G/3.13G [00:49<00:14, 47.8MB/s]
model.safetensors: 78%|ββββββββ | 2.45G/3.13G [00:49<00:13, 50.7MB/s]
model.safetensors: 79%|ββββββββ | 2.46G/3.13G [00:50<00:12, 53.2MB/s]
model.safetensors: 79%|ββββββββ | 2.48G/3.13G [00:50<00:12, 53.1MB/s]
model.safetensors: 80%|ββββββββ | 2.50G/3.13G [00:50<00:12, 53.0MB/s]
model.safetensors: 80%|ββββββββ | 2.51G/3.13G [00:51<00:11, 53.4MB/s]
model.safetensors: 81%|ββββββββ | 2.53G/3.13G [00:51<00:10, 56.8MB/s]
model.safetensors: 81%|ββββββββ | 2.54G/3.13G [00:51<00:10, 58.6MB/s]
model.safetensors: 82%|βββββββββ | 2.56G/3.13G [00:51<00:09, 60.3MB/s]
model.safetensors: 82%|βββββββββ | 2.58G/3.13G [00:52<00:08, 63.6MB/s]
model.safetensors: 83%|βββββββββ | 2.59G/3.13G [00:52<00:08, 63.5MB/s]
model.safetensors: 83%|βββββββββ | 2.61G/3.13G [00:52<00:08, 61.8MB/s]
model.safetensors: 84%|βββββββββ | 2.62G/3.13G [00:52<00:08, 60.4MB/s]
model.safetensors: 84%|βββββββββ | 2.64G/3.13G [00:53<00:08, 61.3MB/s]
model.safetensors: 85%|βββββββββ | 2.66G/3.13G [00:53<00:07, 61.7MB/s]
model.safetensors: 85%|βββββββββ | 2.67G/3.13G [00:53<00:07, 60.1MB/s]
model.safetensors: 86%|βββββββββ | 2.69G/3.13G [00:53<00:07, 57.3MB/s]
model.safetensors: 86%|βββββββββ | 2.70G/3.13G [00:54<00:07, 54.9MB/s]
model.safetensors: 87%|βββββββββ | 2.72G/3.13G [00:54<00:06, 59.5MB/s]
model.safetensors: 87%|βββββββββ | 2.74G/3.13G [00:54<00:06, 60.2MB/s]
model.safetensors: 88%|βββββββββ | 2.75G/3.13G [00:55<00:06, 59.9MB/s]
model.safetensors: 88%|βββββββββ | 2.77G/3.13G [00:55<00:05, 61.7MB/s]
model.safetensors: 89%|βββββββββ | 2.78G/3.13G [00:55<00:05, 60.8MB/s]
model.safetensors: 89%|βββββββββ | 2.80G/3.13G [00:55<00:05, 65.7MB/s]
model.safetensors: 90%|βββββββββ | 2.82G/3.13G [00:56<00:04, 64.5MB/s]
model.safetensors: 90%|βββββββββ | 2.83G/3.13G [00:56<00:05, 58.5MB/s]
model.safetensors: 91%|βββββββββ | 2.85G/3.13G [00:56<00:04, 59.6MB/s]
model.safetensors: 91%|ββββββββββ| 2.86G/3.13G [00:56<00:04, 58.4MB/s]
model.safetensors: 92%|ββββββββββ| 2.88G/3.13G [00:57<00:04, 59.9MB/s]
model.safetensors: 92%|ββββββββββ| 2.90G/3.13G [00:57<00:03, 64.9MB/s]
model.safetensors: 93%|ββββββββββ| 2.91G/3.13G [00:57<00:04, 44.3MB/s]
model.safetensors: 93%|ββββββββββ| 2.93G/3.13G [00:58<00:04, 47.8MB/s]
model.safetensors: 94%|ββββββββββ| 2.94G/3.13G [00:58<00:03, 51.5MB/s]
model.safetensors: 94%|ββββββββββ| 2.96G/3.13G [00:58<00:03, 56.2MB/s]
model.safetensors: 95%|ββββββββββ| 2.98G/3.13G [00:59<00:02, 57.0MB/s]
model.safetensors: 96%|ββββββββββ| 2.99G/3.13G [00:59<00:02, 59.5MB/s]
model.safetensors: 96%|ββββββββββ| 3.01G/3.13G [00:59<00:02, 59.4MB/s]
model.safetensors: 97%|ββββββββββ| 3.02G/3.13G [00:59<00:01, 58.0MB/s]
model.safetensors: 97%|ββββββββββ| 3.04G/3.13G [01:00<00:01, 58.0MB/s]
model.safetensors: 98%|ββββββββββ| 3.06G/3.13G [01:00<00:01, 60.3MB/s]
model.safetensors: 98%|ββββββββββ| 3.07G/3.13G [01:00<00:01, 58.3MB/s]
model.safetensors: 99%|ββββββββββ| 3.09G/3.13G [01:00<00:00, 58.8MB/s]
model.safetensors: 99%|ββββββββββ| 3.10G/3.13G [01:01<00:00, 61.4MB/s]
model.safetensors: 100%|ββββββββββ| 3.12G/3.13G [01:01<00:00, 60.7MB/s]
model.safetensors: 100%|ββββββββββ| 3.13G/3.13G [01:01<00:00, 50.8MB/s]
Upload 3 LFS files: 33%|ββββ | 1/3 [01:01<02:03, 61.90s/it][A[A
Upload 3 LFS files: 100%|ββββββββββ| 3/3 [01:01<00:00, 20.63s/it]
|