Qwen3-4B Stage-3 NCE — beam-perround variant
Stage-3 Noise-Contrastive-Estimation training resumed from the Stage-2 end ckpt at step 7335. NCE phase trains the scorer head to rank K=4 candidate completions per block, with beam-bayes proposal sampling and per-round score combination.
Files
| File | Size |
|---|---|
model.pt |
8.5 GB |
optimizer.pt |
0.45 GB |
scheduler.pt |
1.7 KB |
eval_batches.pt |
13 MB |
rng_rank{0..23}.pt |
14.7 KB each |
Total: ~9.54 GB / 30 files. Full resume state for re-training.
Step + lineage
- Resume from: Stage-2 ckpt at step 7335 (4B Qwen3-Base CPT pipeline)
- This ckpt: step 8706 (1371 steps of NCE training)
- Backbone: Qwen3-4B-Base (frozen during NCE phase)
- Training script:
scripts/train_unified.py - Config:
configs/large_scale/qwen3_4b_stage3_nce_resume7335_beam_perround_6n.yaml
Eval (Stage-3 BoN, K=4, R=1, α=0.5, beam_bayes argmax)
| benchmark | acc |
|---|---|
| GSM8K (1319q) | 79.83% (1053/1319) |
| MATH-500 (500q) | 48.00% (240/500) |
| HumanEval (164q) | 56.10% (92/164) |
| GPQA-diamond (193q) | 36.27% (70/193) |
Compared to Stage-2 baseline at step 7335 (greedy uncommitted_soft):
| Stage-3 NCE | Stage-2 baseline | Δ | |
|---|---|---|---|
| GSM8K | 79.83% | 82.64% | −2.81pp |
| MATH-500 | 48.00% | 51.00% | −3.00pp |
| HumanEval | 56.10% | 60.98% | −4.88pp |
| GPQA | 36.27% | 36.27% | 0 |
The scorer-rerank is currently neutral-to-slightly-negative — see notes
on score_scale plateau in the project README. A learnable-scale
re-train (-b2fix variant) is queued.
Loading
from draft_refine.training.checkpointing import load_full_state
ckpt = load_full_state("./checkpoint-00008706-20260501_081812/")
# ckpt.model contains the DiffusionLM with scorer head attached
Related archives
haotiansun014/qwen3-4b-stage3-nce-7335-lastround-archive— same training recipe but withcombine=lastround(final-round score, not per-round).haotiansun014/qwen3-4b-stage3-nce-7335-temp07-archive— softmax_sampling proposal at T=0.7 (vs argmax in beam-perround).
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support