---
library_name: transformers
tags: []
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

current batches:

`nv3[v0] (1700) | nv4[v1-2k] (4000) | nv4[v1-210k] (b1b2: 4000)`

Try using `google/siglip2-large-patch16-512` instead of dino v2 for a model difference (turns out 1% better than `google/siglip2-base-patch16-512`)..

eval metrics:
```
wandb: Run summary:
wandb:            eval/accuracy 0.77533
wandb:                eval/loss 0.4809
wandb:             eval/runtime 15.9025
wandb:  eval/samples_per_second 111.114
wandb:    eval/steps_per_second 0.692
wandb:               total_flos 1.4915777670524436e+20
wandb:              train/epoch 10.0
wandb:        train/global_step 570
wandb:          train/grad_norm 375217.9375
wandb:      train/learning_rate 0.0
wandb:               train/loss 0.286
wandb:               train_loss 0.40591
wandb:            train_runtime 1032.5423
wandb: train_samples_per_second 96.974
wandb:   train_steps_per_second 0.552
```


## Model Details

trainlib commit: 1b17bfef5ccbb5a22157e56ab8da71ba7c8c0ed6  
- (it was comitted right after aug was changed for a later task)

training script:

```bash
#!/bin/bash

# =================== BEGIN NOTES =======================

# BS24 ooms; bs18 66943MiB / 81559MiB; try bs22
# bs22 (try to match siglip2-base for large as much as possible):  77679MiB / 81559MiB

# ORIGINAL AUGMENTATION:
# - model trained on this with exact config had eval/accuracy 0.77533

# train_transforms = Compose([
#     RandomResizedCrop(size),
#     RandomHorizontalFlip(),
#     ToTensor(),
#     normalize,
# ])

# MODIFIED AUGMENTATION:

# from torchvision.transforms import Compose, RandomResizedCrop, RandomRotation, RandomHorizontalFlip, ColorJitter, RandomApply, GaussianBlur, ToTensor

# train_transforms = Compose([
#     RandomResizedCrop(size=224, scale=(0.8, 1.0), ratio=(0.9, 1.1)),
#     RandomRotation(5),
#     RandomHorizontalFlip(p=0.2),
#     ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.05),
#     RandomApply([GaussianBlur(kernel_size=3, sigma=(0.5, 1.5))], p=0.1),
#     ToTensor(),
#     normalize,
# ])


# =================== END NOTES ==========================

# Define variables
BASE_MODEL="google/siglip2-large-patch16-512"
DATASET="distill-lab/COMBINE_nai-distill_00-01_eagle.library"
TASK="classification"
NUM_EPOCHS=10


# Run training command
python -m trainlib.hf_trainer.cli \
  --model_name_or_path $BASE_MODEL \
  --dataset_name $DATASET \
  --output_dir distill-n4_00-01_combined_cls_v1b2_classification_$BASE_MODEL \
  --remove_unused_columns False \
  --label_column_name star \
  --task $TASK \
  --do_train \
  --do_eval \
  --eval_strategy steps \
  --eval_steps 100 \
  --learning_rate 5e-6 \
  --num_train_epochs $NUM_EPOCHS \
  --per_device_train_batch_size 22 \
  --per_device_eval_batch_size 22 \
  --logging_strategy steps \
  --logging_steps 2 \
  --save_total_limit 1 \
  --seed 1337 \
  --lr_scheduler_type cosine \
  --dataloader_num_workers 16 \
  --ignore_mismatched_sizes True \
  --fp16 True  # EXTRA ARGUMENT

```