hZzy/train_pairwise
Viewer • Updated • 56.6k • 3 • 1
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Logps | Logits | Objective | Dpo Loss | Regularize | Ranking Simple | Ranking Idealized | Ranking Idealized Expo |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.5807 | 0.2834 | 50 | 0.6822 | -101.8611 | -1.8709 | 0.7088 | 0.7088 | 0.7088 | 0.5196 | 0.5888 | 0.5103 |
| 0.4908 | 0.5668 | 100 | 0.6802 | -105.0768 | -1.8507 | 0.6854 | 0.6854 | 0.6854 | 0.5300 | 0.5888 | 0.5103 |
| 0.4191 | 0.8503 | 150 | 0.6960 | -108.5704 | -2.1205 | 0.7127 | 0.7127 | 0.7127 | 0.5403 | 0.5888 | 0.5103 |
| 0.2287 | 1.1337 | 200 | 0.7276 | -115.4432 | -2.0764 | 0.7403 | 0.7403 | 0.7403 | 0.5362 | 0.5888 | 0.5103 |
| 0.2329 | 1.4171 | 250 | 0.7454 | -118.2405 | -2.0640 | 0.7706 | 0.7706 | 0.7706 | 0.5351 | 0.5888 | 0.5103 |
| 0.2036 | 1.7005 | 300 | 0.7574 | -120.7682 | -1.9746 | 0.7851 | 0.7851 | 0.7851 | 0.5434 | 0.5888 | 0.5103 |
| 0.2102 | 1.9839 | 350 | 0.7556 | -120.0429 | -1.9737 | 0.7840 | 0.7840 | 0.7840 | 0.5403 | 0.5888 | 0.5103 |
Base model
hZzy/qwen2.5-0.5b-sft-news-IFT