--- library_name: transformers tags: [] --- # Model Card for Model ID current batches: `nv3[v0] (1700) | nv4[v1-2k] (4000) | nv4[v1-210k] (b1b2: 4000)` Try using `google/siglip2-large-patch16-512` instead of dino v2 for a model difference (turns out 1% better than `google/siglip2-base-patch16-512`).. eval metrics: ``` wandb: Run summary: wandb: eval/accuracy 0.77533 wandb: eval/loss 0.4809 wandb: eval/runtime 15.9025 wandb: eval/samples_per_second 111.114 wandb: eval/steps_per_second 0.692 wandb: total_flos 1.4915777670524436e+20 wandb: train/epoch 10.0 wandb: train/global_step 570 wandb: train/grad_norm 375217.9375 wandb: train/learning_rate 0.0 wandb: train/loss 0.286 wandb: train_loss 0.40591 wandb: train_runtime 1032.5423 wandb: train_samples_per_second 96.974 wandb: train_steps_per_second 0.552 ``` ## Model Details trainlib commit: 1b17bfef5ccbb5a22157e56ab8da71ba7c8c0ed6 - (it was comitted right after aug was changed for a later task) training script: ```bash #!/bin/bash # =================== BEGIN NOTES ======================= # BS24 ooms; bs18 66943MiB / 81559MiB; try bs22 # bs22 (try to match siglip2-base for large as much as possible): 77679MiB / 81559MiB # ORIGINAL AUGMENTATION: # - model trained on this with exact config had eval/accuracy 0.77533 # train_transforms = Compose([ # RandomResizedCrop(size), # RandomHorizontalFlip(), # ToTensor(), # normalize, # ]) # MODIFIED AUGMENTATION: # from torchvision.transforms import Compose, RandomResizedCrop, RandomRotation, RandomHorizontalFlip, ColorJitter, RandomApply, GaussianBlur, ToTensor # train_transforms = Compose([ # RandomResizedCrop(size=224, scale=(0.8, 1.0), ratio=(0.9, 1.1)), # RandomRotation(5), # RandomHorizontalFlip(p=0.2), # ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.05), # RandomApply([GaussianBlur(kernel_size=3, sigma=(0.5, 1.5))], p=0.1), # ToTensor(), # normalize, # ]) # =================== END NOTES ========================== # Define variables BASE_MODEL="google/siglip2-large-patch16-512" DATASET="distill-lab/COMBINE_nai-distill_00-01_eagle.library" TASK="classification" NUM_EPOCHS=10 # Run training command python -m trainlib.hf_trainer.cli \ --model_name_or_path $BASE_MODEL \ --dataset_name $DATASET \ --output_dir distill-n4_00-01_combined_cls_v1b2_classification_$BASE_MODEL \ --remove_unused_columns False \ --label_column_name star \ --task $TASK \ --do_train \ --do_eval \ --eval_strategy steps \ --eval_steps 100 \ --learning_rate 5e-6 \ --num_train_epochs $NUM_EPOCHS \ --per_device_train_batch_size 22 \ --per_device_eval_batch_size 22 \ --logging_strategy steps \ --logging_steps 2 \ --save_total_limit 1 \ --seed 1337 \ --lr_scheduler_type cosine \ --dataloader_num_workers 16 \ --ignore_mismatched_sizes True \ --fp16 True # EXTRA ARGUMENT ```