Buckets:

NorthernTribe-Research
/

maasai-project-storage

4.03 GB

17,610 files

Updated 25 minutes ago

Ctrl+K

Name	Size	Uploaded	Xet hash
.cache		1 day ago	6 items
README.md	1.96 kB xet	1 day ago	12cd5c7c
adapter_config.json	1.06 kB xet	1 day ago	690ff6c6
tokenizer.json	11.4 MB xet	1 day ago	77f0fc88
tokenizer_config.json	665 Bytes xet	1 day ago	db0224f0

README.md

maasai-en-mt-staging

This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 32
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 0.03
training_steps: 800

Training Loss	Epoch	Step	Validation Loss
2.5781	0.3794	100	2.5126
2.2593	0.7588	200	2.2396
1.9743	1.1366	300	2.1062
1.8846	1.5160	400	2.0034
1.8152	1.8954	500	1.9168
1.5970	2.2732	600	1.8696
1.5604	2.6526	700	1.8172
1.5004	3.0304	800	1.7846