4.03 GB
17,610 files
Updated 25 minutes ago
Name
Size
.cache
README.md1.96 kB
xet
adapter_config.json1.06 kB
xet
tokenizer.json11.4 MB
xet
tokenizer_config.json665 Bytes
xet
README.md

maasai-en-mt-staging

This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.7846

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 0.03
  • training_steps: 800

Training results

Training Loss Epoch Step Validation Loss
2.5781 0.3794 100 2.5126
2.2593 0.7588 200 2.2396
1.9743 1.1366 300 2.1062
1.8846 1.5160 400 2.0034
1.8152 1.8954 500 1.9168
1.5970 2.2732 600 1.8696
1.5604 2.6526 700 1.8172
1.5004 3.0304 800 1.7846

Framework versions

  • PEFT 0.18.1
  • Transformers 5.4.0
  • Pytorch 2.5.1+cu118
  • Datasets 4.8.4
  • Tokenizers 0.22.2
Total size
4.03 GB
Files
17,610
Last updated
Jun 20
Pre-warmed CDN
US EU US EU

Contributors