config.json: 100%  433/433 [00:00<00:00, 53.7kB/s] model.safetensors: 100%  445M/445M [00:04<00:00, 205MB/s] Loading weights: 100%  199/199 [00:00<00:00, 974.29it/s, Materializing param=bert.pooler.dense.weight] BertForSequenceClassification LOAD REPORT from: dbmdz/bert-base-italian-xxl-cased Key | Status | -------------------------------------------+------------+- cls.seq_relationship.weight | UNEXPECTED | cls.predictions.bias | UNEXPECTED | cls.predictions.transform.dense.bias | UNEXPECTED | cls.seq_relationship.bias | UNEXPECTED | cls.predictions.transform.LayerNorm.bias | UNEXPECTED | cls.predictions.transform.LayerNorm.weight | UNEXPECTED | cls.predictions.transform.dense.weight | UNEXPECTED | classifier.bias | MISSING | classifier.weight | MISSING | Notes: - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch. - MISSING :those params were newly initialized because missing from the checkpoint. Consider training on your downstream task. ======== Epoch 1 / 3 ======== Training... Batch 40 of 378. Elapsed: 0:00:19. Batch 80 of 378. Elapsed: 0:00:38. Batch 120 of 378. Elapsed: 0:00:56. Batch 160 of 378. Elapsed: 0:01:14. Batch 200 of 378. Elapsed: 0:01:33. Batch 240 of 378. Elapsed: 0:01:51. Batch 280 of 378. Elapsed: 0:02:09. Batch 320 of 378. Elapsed: 0:02:28. Batch 360 of 378. Elapsed: 0:02:46. Average training loss: 0.39 Training took: 0:02:54 Running Validation... Average test loss: 0.36 Validation took: 0:00:15 precision recall f1-score support 0 0.80 0.93 0.86 2823 1 0.90 0.71 0.79 2351 accuracy 0.83 5174 macro avg 0.85 0.82 0.83 5174 weighted avg 0.84 0.83 0.83 5174 ======== Epoch 2 / 3 ======== Training... Batch 40 of 378. Elapsed: 0:00:18. Batch 80 of 378. Elapsed: 0:00:36. Batch 120 of 378. Elapsed: 0:00:55. Batch 160 of 378. Elapsed: 0:01:13. Batch 200 of 378. Elapsed: 0:01:31. Batch 240 of 378. Elapsed: 0:01:50. Batch 280 of 378. Elapsed: 0:02:08. Batch 320 of 378. Elapsed: 0:02:26. Batch 360 of 378. Elapsed: 0:02:45. Average training loss: 0.20 Training took: 0:02:53 Running Validation... Average test loss: 0.41 Validation took: 0:00:15 precision recall f1-score support 0 0.82 0.91 0.87 2823 1 0.88 0.77 0.82 2351 accuracy 0.85 5174 macro avg 0.85 0.84 0.84 5174 weighted avg 0.85 0.85 0.85 5174 ======== Epoch 3 / 3 ======== Training... Batch 40 of 378. Elapsed: 0:00:18. Batch 80 of 378. Elapsed: 0:00:36. Batch 120 of 378. Elapsed: 0:00:55. Batch 160 of 378. Elapsed: 0:01:13. Batch 200 of 378. Elapsed: 0:01:31. Batch 240 of 378. Elapsed: 0:01:50. Batch 280 of 378. Elapsed: 0:02:08. Batch 320 of 378. Elapsed: 0:02:26. Batch 360 of 378. Elapsed: 0:02:45. Average training loss: 0.07 Training took: 0:02:53 Running Validation... Average test loss: 0.60 Validation took: 0:00:15 precision recall f1-score support 0 0.86 0.89 0.88 2823 1 0.87 0.83 0.85 2351 accuracy 0.86 5174 macro avg 0.86 0.86 0.86 5174 weighted avg 0.86 0.86 0.86 5174 Training complete!