4.03 GB
18,510 files
Updated about 1 hour ago
Name
Size
.cache
tokenizer_config.json665 Bytes
xet
tokenizer.json11.4 MB
xet
adapter_config.json1.06 kB
xet
README.md1.96 kB
xet
README.md

maasai-en-mt-staging

This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.7846

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 0.03
  • training_steps: 800

Training results

Training Loss Epoch Step Validation Loss
2.5781 0.3794 100 2.5126
2.2593 0.7588 200 2.2396
1.9743 1.1366 300 2.1062
1.8846 1.5160 400 2.0034
1.8152 1.8954 500 1.9168
1.5970 2.2732 600 1.8696
1.5604 2.6526 700 1.8172
1.5004 3.0304 800 1.7846

Framework versions

  • PEFT 0.18.1
  • Transformers 5.4.0
  • Pytorch 2.5.1+cu118
  • Datasets 4.8.4
  • Tokenizers 0.22.2
Total size
4.03 GB
Files
18,510
Last updated
Jun 22
Pre-warmed CDN
US EU US EU

Contributors