Buckets:

NorthernTribe-Research
/

maasai-project-storage

4.03 GB

18,510 files

Updated about 1 hour ago

Ctrl+K

Name	Size	Uploaded	Xet hash
.cache		7 days ago	6 items
tokenizer_config.json	665 Bytes xet	7 days ago	db0224f0
tokenizer.json	11.4 MB xet	7 days ago	77f0fc88
adapter_config.json	1.06 kB xet	7 days ago	690ff6c6
README.md	1.96 kB xet	7 days ago	12cd5c7c

README.md

maasai-en-mt-staging

This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 32
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 0.03
training_steps: 800

Training Loss	Epoch	Step	Validation Loss
2.5781	0.3794	100	2.5126
2.2593	0.7588	200	2.2396
1.9743	1.1366	300	2.1062
1.8846	1.5160	400	2.0034
1.8152	1.8954	500	1.9168
1.5970	2.2732	600	1.8696
1.5604	2.6526	700	1.8172
1.5004	3.0304	800	1.7846