---
tags:
- generated_from_trainer
model-index:
- name: tinyllama-1.1B-intermediate-step-715k-1.5T-dpo-lora-v4
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# tinyllama-1.1B-intermediate-step-715k-1.5T-dpo-lora-v4

This model was trained from scratch on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6904
- Rewards/chosen: -3.5271
- Rewards/rejected: -5.6475
- Rewards/accuracies: 0.7393
- Rewards/margins: 2.1205
- Logps/rejected: -394.1334
- Logps/chosen: -478.6117
- Logits/rejected: -3.8937
- Logits/chosen: -4.0184

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 32
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.02
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.5491        | 0.34  | 300  | 0.5719          | -0.5176        | -1.3357          | 0.7015             | 0.8181          | -351.0149      | -448.5167    | -4.0592         | -4.2257       |
| 0.5906        | 0.68  | 600  | 0.5625          | -0.3365        | -1.2779          | 0.7191             | 0.9414          | -350.4370      | -446.7061    | -4.0731         | -4.2239       |
| 0.2857        | 1.02  | 900  | 0.5723          | -0.3882        | -1.5979          | 0.7141             | 1.2097          | -353.6368      | -447.2226    | -4.0753         | -4.2332       |
| 0.2679        | 1.36  | 1200 | 0.5883          | -1.1630        | -2.3423          | 0.7234             | 1.1793          | -361.0811      | -454.9714    | -4.0115         | -4.1888       |
| 0.231         | 1.71  | 1500 | 0.5895          | -1.3278        | -2.7966          | 0.7338             | 1.4688          | -365.6242      | -456.6194    | -4.0069         | -4.1696       |
| 0.0862        | 2.05  | 1800 | 0.6626          | -2.7764        | -4.6708          | 0.7284             | 1.8944          | -384.3661      | -471.1047    | -3.9624         | -4.0992       |
| 0.0804        | 2.39  | 2100 | 0.6818          | -3.0330        | -5.1156          | 0.7410             | 2.0826          | -388.8140      | -473.6706    | -3.9128         | -4.0467       |
| 0.0925        | 2.73  | 2400 | 0.6947          | -3.5621        | -5.6537          | 0.7371             | 2.0916          | -394.1956      | -478.9623    | -3.8908         | -4.0137       |


### Framework versions

- Transformers 4.35.0
- Pytorch 2.1.0+cu121
- Datasets 2.14.6
- Tokenizers 0.14.1