--- library_name: transformers license: gemma base_model: aisingapore/Gemma-SEA-LION-v4-27B-IT tags: - generated_from_trainer model-index: - name: outputs-pt/sealion-v4-gemma-3-27b-hero_pre_train_v2 results: [] --- [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.12.1` ```yaml base_model: aisingapore/Gemma-SEA-LION-v4-27B-IT # Automatically upload checkpoint and final model to HF # hub_model_id: username/custom_model_name skip_prepare_dataset: false remove_unused_columns: false sample_packing: false ddp_find_unused_parameters: true deepspeed: deepspeed_configs/zero3.json chat_template: gemma3 eot_tokens: - datasets: - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_textcaps_EN_SEA type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_xm3600_EN_SEA type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_xflickr_EN_SEA_chunk_1 type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_xflickr_EN_SEA_chunk_2 type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_xflickr_EN_SEA_chunk_3 type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_xflickr_EN_SEA_chunk_4 type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_xflickr_EN_SEA_chunk_5 type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_xflickr_EN_SEA_chunk_6 type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_xflickr_EN_SEA_chunk_7 type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_xflickr_EN_SEA_chunk_8 type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_xflickr_EN_SEA_chunk_9 type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_xflickr_EN_SEA_chunk_10 type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_sea_vl_EN_SEA_chunk_1 type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_sea_vl_EN_SEA_chunk_2 type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_sea_vl_EN_SEA_chunk_3 type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_sea_vl_EN_SEA_chunk_4 type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_sea_vl_EN_SEA_chunk_5 type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_sea_vl_EN_SEA_chunk_6 type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_sea_vl_EN_SEA_chunk_7 type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_sea_vl_EN_SEA_chunk_8 type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_sea_vl_EN_SEA_chunk_9 type: chat_template split: train field_messages: messages - path: /mnt/weka/all/holy/translation_parallel/pretrain_dataset_sea_vl_EN_SEA_chunk_10 type: chat_template split: train field_messages: messages dataset_prepared_path: peerat_test_path val_set_size: 0.01 output_dir: ./outputs-pt/sealion-v4-gemma-3-27b-hero_pre_train_v2 sequence_len: 2048 pad_to_sequence_len: false wandb_project: wandb_entity: wandb_watch: wandb_name: wandb_log_model: gradient_accumulation_steps: 2 micro_batch_size: 4 num_epochs: 1 optimizer: paged_adamw_8bit lr_scheduler: cosine learning_rate: 2e-5 bf16: true fp16: tf32: true gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false resume_from_checkpoint: logging_steps: 1 flash_attention: false # attn_implementation: sdpa warmup_ratio: 0.1 evals_per_epoch: 1 save_steps: 1000 save_total_limit: 3 weight_decay: 0.0 # save_first_step: true # uncomment this to validate checkpoint saving works with your config ```

# outputs-pt/sealion-v4-gemma-3-27b-hero_pre_train_v2 This model is a fine-tuned version of [aisingapore/Gemma-SEA-LION-v4-27B-IT](https://huggingface.co/aisingapore/Gemma-SEA-LION-v4-27B-IT) on the None dataset. ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 80 - gradient_accumulation_steps: 2 - total_train_batch_size: 640 - total_eval_batch_size: 320 - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 1156 - training_steps: 11568 ### Training results ### Framework versions - Transformers 4.55.0 - Pytorch 2.6.0+cu124 - Datasets 4.0.0 - Tokenizers 0.21.4