speecht5_arabic_female_voice

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 4
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 400
training_steps: 4000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.5085	4.3360	200	0.4609
0.4697	8.6721	400	0.4265
0.4509	13.0081	600	0.4111
0.4289	17.3442	800	0.4057
0.4208	21.6802	1000	0.4049
0.4136	26.0163	1200	0.3990
0.4072	30.3523	1400	0.3980
0.4011	34.6883	1600	0.3920
0.3971	39.0244	1800	0.3905
0.3898	43.3604	2000	0.3886
0.3923	47.8130	2200	0.3813
0.3755	52.1491	2400	0.3691
0.3629	56.4851	2600	0.3655
0.3599	60.8211	2800	0.3660
0.3525	65.1572	3000	0.3632
0.3493	69.4932	3200	0.3603
0.3452	73.8293	3400	0.3626
0.3468	78.1653	3600	0.3597
0.3411	82.5014	3800	0.3621
0.3464	86.8374	4000	0.3636

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model