---
library_name: transformers
license: apache-2.0
base_model: albert/albert-xxlarge-v2
tags:
- generated_from_trainer
metrics:
- accuracy
model-index:
- name: 5a2a5c0a0eb450885cd5fb1af9824857
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# 5a2a5c0a0eb450885cd5fb1af9824857

This model is a fine-tuned version of [albert/albert-xxlarge-v2](https://huggingface.co/albert/albert-xxlarge-v2) on the contemmcm/cls_mmlu dataset.
It achieves the following results on the evaluation set:
- Loss: 1.3877
- Data Size: 1.0
- Epoch Runtime: 120.0157
- Accuracy: 0.2487
- F1 Macro: 0.0996

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50

### Training results

| Training Loss | Epoch | Step  | Validation Loss | Data Size | Epoch Runtime | Accuracy | F1 Macro |
|:-------------:|:-----:|:-----:|:---------------:|:---------:|:-------------:|:--------:|:--------:|
| No log        | 0     | 0     | 1.7784          | 0         | 3.2331        | 0.2460   | 0.1725   |
| No log        | 1     | 438   | 1.4876          | 0.0078    | 4.3292        | 0.2407   | 0.2224   |
| No log        | 2     | 876   | 1.4012          | 0.0156    | 5.1143        | 0.2427   | 0.2079   |
| No log        | 3     | 1314  | 1.4484          | 0.0312    | 7.1618        | 0.2620   | 0.1681   |
| No log        | 4     | 1752  | 1.3940          | 0.0625    | 11.0504       | 0.2527   | 0.1008   |
| 0.0825        | 5     | 2190  | 1.3985          | 0.125     | 18.0212       | 0.2453   | 0.0985   |
| 0.1948        | 6     | 2628  | 1.4229          | 0.25      | 32.7564       | 0.2487   | 0.0996   |
| 1.4657        | 7     | 3066  | 1.4154          | 0.5       | 61.8446       | 0.2453   | 0.0985   |
| 1.3901        | 8.0   | 3504  | 1.3900          | 1.0       | 121.1050      | 0.2487   | 0.0996   |
| 1.3862        | 9.0   | 3942  | 1.3896          | 1.0       | 120.6294      | 0.2527   | 0.1008   |
| 1.3877        | 10.0  | 4380  | 1.3903          | 1.0       | 119.7003      | 0.2527   | 0.1008   |
| 1.3873        | 11.0  | 4818  | 1.3876          | 1.0       | 120.3298      | 0.2533   | 0.1011   |
| 1.3881        | 12.0  | 5256  | 1.3880          | 1.0       | 120.7355      | 0.2527   | 0.1008   |
| 1.3885        | 13.0  | 5694  | 1.3871          | 1.0       | 119.5208      | 0.2487   | 0.0996   |
| 1.3885        | 14.0  | 6132  | 1.3877          | 1.0       | 120.3651      | 0.2527   | 0.1008   |
| 1.3874        | 15.0  | 6570  | 1.3895          | 1.0       | 120.5198      | 0.2527   | 0.1008   |
| 1.3875        | 16.0  | 7008  | 1.3881          | 1.0       | 119.8156      | 0.2527   | 0.1008   |
| 1.3837        | 17.0  | 7446  | 1.3867          | 1.0       | 120.1265      | 0.2487   | 0.0996   |
| 1.389         | 18.0  | 7884  | 1.3861          | 1.0       | 120.4231      | 0.2527   | 0.1008   |
| 1.3861        | 19.0  | 8322  | 1.3865          | 1.0       | 120.5597      | 0.2533   | 0.1011   |
| 1.3883        | 20.0  | 8760  | 1.3844          | 1.0       | 120.1635      | 0.2527   | 0.1008   |
| 1.387         | 21.0  | 9198  | 1.3880          | 1.0       | 120.1970      | 0.2527   | 0.1008   |
| 1.3874        | 22.0  | 9636  | 1.3847          | 1.0       | 120.1456      | 0.2533   | 0.1011   |
| 1.3855        | 23.0  | 10074 | 1.3882          | 1.0       | 119.9105      | 0.2533   | 0.1011   |
| 1.3856        | 24.0  | 10512 | 1.3877          | 1.0       | 120.0157      | 0.2487   | 0.0996   |


### Framework versions

- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.3.0
- Tokenizers 0.22.1