Qwen3.5-2B-metamath / speed-estimate.md
WhiteGiverPlus's picture
Add files using upload-large-folder tool
bde1506 verified

Qwen3.5-2B-metamath 8192 speed estimate

  • Data: metamath-output/setmm-train-qwen35-4b-mixed-12000, 4000 original + 12000 expanded.
  • max_length=8192 keeps about 14492/16000 examples from tokenizer length scan.
  • Training config: Qwen3.5-2B base, FLA fast path available, LoRA rank 32/alpha 64/dropout 0.05, bf16, gradient checkpointing, lr=5e-4, 1 epoch.
  • Batch: per-device train batch size 2, gradient accumulation 8 on one GPU, effective batch size about 16.
  • Smoke run: 59 train examples, 4 optimizer steps, runtime 120.9s. First step was about 94s due to compile/init; later steps were about 8-11s/step on the short smoke sample.
  • Full 1-epoch steps: about 888 optimizer steps after 2% eval split.
  • Estimated full runtime: roughly 4-8 hours depending on length mix, checkpoint/eval cost, and whether the compiled kernels stay warm.
  • Log: /data/pretrained_models/Qwen3.5-2B-metamath/train-8192.log
  • PID file: /data/pretrained_models/Qwen3.5-2B-metamath/train-8192.pid