Qwen3-4B Stage-3 NCE — beam-lastround variant

Stage-3 Noise-Contrastive-Estimation training resumed from the Stage-2 end ckpt at step 7335. NCE phase trains the scorer head to rank K=4 candidate completions per block, with beam-bayes proposal sampling but lastround score aggregation (only the final K_inner-th iter score is used for picking, vs perround which averages across iters).

Files

File	Size
`model.pt`	8.5 GB
`optimizer.pt`	0.45 GB
`scheduler.pt`	1.7 KB
`eval_batches.pt`	13 MB
`rng_rank{0..23}.pt`	14.7 KB each

Total: ~9.54 GB / 30 files. Full resume state for re-training.

Step + lineage

Resume from: Stage-2 ckpt at step 7335
This ckpt: step 10433 (3098 NCE-phase steps; ~2× faster training than perround due to scoring only the final iter)
Backbone: Qwen3-4B-Base (frozen during NCE)
Config: configs/large_scale/qwen3_4b_stage3_nce_resume7335_beam_lastround_6n.yaml

Eval (Stage-3 BoN with this ckpt)

Audit on gsm8k 200q at α=0.0 (matches Stage-2 baseline):

	acc
α=0.0 + beam_bayes argmax	84.00%

Identical to perround chain at α=0 (within noise) — both chains' backbones remain intact under the freeze. The lastround variant trains 2× faster.

Related archives

haotiansun014/qwen3-4b-stage3-nce-7335-perround-archive
haotiansun014/qwen3-4b-stage3-nce-7335-temp07-archive

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support