Qwen3-4B Stage-3 NCE — beam-lastround variant
Stage-3 Noise-Contrastive-Estimation training resumed from the Stage-2
end ckpt at step 7335. NCE phase trains the scorer head to rank K=4
candidate completions per block, with beam-bayes proposal sampling but
lastround score aggregation (only the final K_inner-th iter score
is used for picking, vs perround which averages across iters).
Files
| File | Size |
|---|---|
model.pt |
8.5 GB |
optimizer.pt |
0.45 GB |
scheduler.pt |
1.7 KB |
eval_batches.pt |
13 MB |
rng_rank{0..23}.pt |
14.7 KB each |
Total: ~9.54 GB / 30 files. Full resume state for re-training.
Step + lineage
- Resume from: Stage-2 ckpt at step 7335
- This ckpt: step 10433 (3098 NCE-phase steps; ~2× faster training than perround due to scoring only the final iter)
- Backbone: Qwen3-4B-Base (frozen during NCE)
- Config:
configs/large_scale/qwen3_4b_stage3_nce_resume7335_beam_lastround_6n.yaml
Eval (Stage-3 BoN with this ckpt)
Audit on gsm8k 200q at α=0.0 (matches Stage-2 baseline):
| acc | |
|---|---|
| α=0.0 + beam_bayes argmax | 84.00% |
Identical to perround chain at α=0 (within noise) — both chains' backbones remain intact under the freeze. The lastround variant trains 2× faster.
Related archives
haotiansun014/qwen3-4b-stage3-nce-7335-perround-archivehaotiansun014/qwen3-4b-stage3-nce-7335-temp07-archive
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support