Qwen3-4B Stage-3 NCE — beam-lastround variant

Stage-3 Noise-Contrastive-Estimation training resumed from the Stage-2 end ckpt at step 7335. NCE phase trains the scorer head to rank K=4 candidate completions per block, with beam-bayes proposal sampling but lastround score aggregation (only the final K_inner-th iter score is used for picking, vs perround which averages across iters).

Files

File Size
model.pt 8.5 GB
optimizer.pt 0.45 GB
scheduler.pt 1.7 KB
eval_batches.pt 13 MB
rng_rank{0..23}.pt 14.7 KB each

Total: ~9.54 GB / 30 files. Full resume state for re-training.

Step + lineage

  • Resume from: Stage-2 ckpt at step 7335
  • This ckpt: step 10433 (3098 NCE-phase steps; ~2× faster training than perround due to scoring only the final iter)
  • Backbone: Qwen3-4B-Base (frozen during NCE)
  • Config: configs/large_scale/qwen3_4b_stage3_nce_resume7335_beam_lastround_6n.yaml

Eval (Stage-3 BoN with this ckpt)

Audit on gsm8k 200q at α=0.0 (matches Stage-2 baseline):

acc
α=0.0 + beam_bayes argmax 84.00%

Identical to perround chain at α=0 (within noise) — both chains' backbones remain intact under the freeze. The lastround variant trains 2× faster.

Related archives

  • haotiansun014/qwen3-4b-stage3-nce-7335-perround-archive
  • haotiansun014/qwen3-4b-stage3-nce-7335-temp07-archive
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support