File size: 4,128 Bytes
dcd6759 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | config.json: 100%
433/433 [00:00<00:00, 53.7kB/s]
model.safetensors: 100%
445M/445M [00:04<00:00, 205MB/s]
Loading weights: 100%
199/199 [00:00<00:00, 974.29it/s, Materializing param=bert.pooler.dense.weight]
BertForSequenceClassification LOAD REPORT from: dbmdz/bert-base-italian-xxl-cased
Key | Status |
-------------------------------------------+------------+-
cls.seq_relationship.weight | UNEXPECTED |
cls.predictions.bias | UNEXPECTED |
cls.predictions.transform.dense.bias | UNEXPECTED |
cls.seq_relationship.bias | UNEXPECTED |
cls.predictions.transform.LayerNorm.bias | UNEXPECTED |
cls.predictions.transform.LayerNorm.weight | UNEXPECTED |
cls.predictions.transform.dense.weight | UNEXPECTED |
classifier.bias | MISSING |
classifier.weight | MISSING |
Notes:
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING :those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.
======== Epoch 1 / 3 ========
Training...
Batch 40 of 378. Elapsed: 0:00:19.
Batch 80 of 378. Elapsed: 0:00:38.
Batch 120 of 378. Elapsed: 0:00:56.
Batch 160 of 378. Elapsed: 0:01:14.
Batch 200 of 378. Elapsed: 0:01:33.
Batch 240 of 378. Elapsed: 0:01:51.
Batch 280 of 378. Elapsed: 0:02:09.
Batch 320 of 378. Elapsed: 0:02:28.
Batch 360 of 378. Elapsed: 0:02:46.
Average training loss: 0.39
Training took: 0:02:54
Running Validation...
Average test loss: 0.36
Validation took: 0:00:15
precision recall f1-score support
0 0.80 0.93 0.86 2823
1 0.90 0.71 0.79 2351
accuracy 0.83 5174
macro avg 0.85 0.82 0.83 5174
weighted avg 0.84 0.83 0.83 5174
======== Epoch 2 / 3 ========
Training...
Batch 40 of 378. Elapsed: 0:00:18.
Batch 80 of 378. Elapsed: 0:00:36.
Batch 120 of 378. Elapsed: 0:00:55.
Batch 160 of 378. Elapsed: 0:01:13.
Batch 200 of 378. Elapsed: 0:01:31.
Batch 240 of 378. Elapsed: 0:01:50.
Batch 280 of 378. Elapsed: 0:02:08.
Batch 320 of 378. Elapsed: 0:02:26.
Batch 360 of 378. Elapsed: 0:02:45.
Average training loss: 0.20
Training took: 0:02:53
Running Validation...
Average test loss: 0.41
Validation took: 0:00:15
precision recall f1-score support
0 0.82 0.91 0.87 2823
1 0.88 0.77 0.82 2351
accuracy 0.85 5174
macro avg 0.85 0.84 0.84 5174
weighted avg 0.85 0.85 0.85 5174
======== Epoch 3 / 3 ========
Training...
Batch 40 of 378. Elapsed: 0:00:18.
Batch 80 of 378. Elapsed: 0:00:36.
Batch 120 of 378. Elapsed: 0:00:55.
Batch 160 of 378. Elapsed: 0:01:13.
Batch 200 of 378. Elapsed: 0:01:31.
Batch 240 of 378. Elapsed: 0:01:50.
Batch 280 of 378. Elapsed: 0:02:08.
Batch 320 of 378. Elapsed: 0:02:26.
Batch 360 of 378. Elapsed: 0:02:45.
Average training loss: 0.07
Training took: 0:02:53
Running Validation...
Average test loss: 0.60
Validation took: 0:00:15
precision recall f1-score support
0 0.86 0.89 0.88 2823
1 0.87 0.83 0.85 2351
accuracy 0.86 5174
macro avg 0.86 0.86 0.86 5174
weighted avg 0.86 0.86 0.86 5174
Training complete! |