Qwen3-4B Stage-3 NCE — beam-perround variant

Stage-3 Noise-Contrastive-Estimation training resumed from the Stage-2 end ckpt at step 7335. NCE phase trains the scorer head to rank K=4 candidate completions per block, with beam-bayes proposal sampling and per-round score combination.

Files

File	Size
`model.pt`	8.5 GB
`optimizer.pt`	0.45 GB
`scheduler.pt`	1.7 KB
`eval_batches.pt`	13 MB
`rng_rank{0..23}.pt`	14.7 KB each

Total: ~9.54 GB / 30 files. Full resume state for re-training.

Step + lineage

Resume from: Stage-2 ckpt at step 7335 (4B Qwen3-Base CPT pipeline)
This ckpt: step 8706 (1371 steps of NCE training)
Backbone: Qwen3-4B-Base (frozen during NCE phase)
Training script: scripts/train_unified.py
Config: configs/large_scale/qwen3_4b_stage3_nce_resume7335_beam_perround_6n.yaml

Eval (Stage-3 BoN, K=4, R=1, α=0.5, beam_bayes argmax)

benchmark	acc
GSM8K (1319q)	79.83% (1053/1319)
MATH-500 (500q)	48.00% (240/500)
HumanEval (164q)	56.10% (92/164)
GPQA-diamond (193q)	36.27% (70/193)

Compared to Stage-2 baseline at step 7335 (greedy uncommitted_soft):

	Stage-3 NCE	Stage-2 baseline	Δ
GSM8K	79.83%	82.64%	−2.81pp
MATH-500	48.00%	51.00%	−3.00pp
HumanEval	56.10%	60.98%	−4.88pp
GPQA	36.27%	36.27%	0

The scorer-rerank is currently neutral-to-slightly-negative — see notes on score_scale plateau in the project README. A learnable-scale re-train (-b2fix variant) is queued.

Loading

from draft_refine.training.checkpointing import load_full_state
ckpt = load_full_state("./checkpoint-00008706-20260501_081812/")
# ckpt.model contains the DiffusionLM with scorer head attached

Related archives

haotiansun014/qwen3-4b-stage3-nce-7335-lastround-archive — same training recipe but with combine=lastround (final-round score, not per-round).
haotiansun014/qwen3-4b-stage3-nce-7335-temp07-archive — softmax_sampling proposal at T=0.7 (vs argmax in beam-perround).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support