The parameters used to train this model, are encoded as follows: G-B1,DR0.2,ACC1,L2.5e-05,R32,A128,E20,TDFT,BF16T,GCLT,MG0.8,GCHF,D1.4,CE0.04,W0.03,cosine,R1 - THE KING Here's the actual python command to train it, we used this in Google Colab Notebook. # Begin the fine-tuning proces %cd /content/drive/MyDrive/VibeVoice-finetuning/ # Define your parameters as Python variables batch_size = 1 drop_rate = 0.2 grad_accum = 1 lr = 2.5e-5 lora_r = 32 lora_alpha = 128 epochs = 20 train_diff = True bf16 = True grad_clip = True max_grad = 0.8 grad_checkpoint = False diff_weight = 1.4 ce_weight = 0.04 warmup = 0.03 scheduler = "cosine" run_num = 2 # Build the output directory dynamically output_dir = ( f"Precise/G-B{batch_size},DR{drop_rate},ACC{grad_accum}," f"L{lr},R{lora_r},A{lora_alpha},E{epochs}," f"TDF{'T' if train_diff else 'F'},BF16{'T' if bf16 else 'F'}," f"GCL{'T' if grad_clip else 'F'},MG{max_grad}," f"GCH{'T' if grad_checkpoint else 'F'},D{diff_weight}," f"CE{ce_weight},W{warmup},{scheduler},R{run_num}" ) # Now use the variables in your command !python -m src.finetune_vibevoice_lora \ --model_name_or_path vibevoice/VibeVoice-7B \ --processor_name_or_path src/vibevoice/processor \ --text_column_name text \ --audio_column_name audio \ --output_dir {output_dir} \ \ --train_jsonl GOLD_cortana_train_data.jsonl \ --per_device_train_batch_size {batch_size} \ --voice_prompt_drop_rate {drop_rate} \ --gradient_accumulation_steps {grad_accum} \ --learning_rate {lr} \ --lora_r {lora_r} \ --lora_alpha {lora_alpha} \ --num_train_epochs {epochs} \ --train_diffusion_head {train_diff} \ --bf16 {bf16} \ --gradient_clipping \ --max_grad_norm {max_grad} \ --gradient_checkpointing {grad_checkpoint} \ --diffusion_loss_weight {diff_weight} \ --ce_loss_weight {ce_weight} \ --warmup_ratio {warmup} \ --lr_scheduler_type {scheduler} \ \ --logging_steps 10 \ --save_steps 1528 \ \ --report_to wandb \ --remove_unused_columns False \ --do_train \ --ddpm_batch_mul 4 \ --lora_target_modules q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj