---
license: apache-2.0
language: en
tags:
  - draft-refine
  - block-diffusion
  - nce
  - qwen3-4b
---

# Qwen3-4B Stage-3 NCE — beam-perround variant

Stage-3 Noise-Contrastive-Estimation training resumed from the Stage-2
end ckpt at step 7335. NCE phase trains the scorer head to rank K=4
candidate completions per block, with beam-bayes proposal sampling and
per-round score combination.

## Files

| File | Size |
|---|---|
| `model.pt` | 8.5 GB |
| `optimizer.pt` | 0.45 GB |
| `scheduler.pt` | 1.7 KB |
| `eval_batches.pt` | 13 MB |
| `rng_rank{0..23}.pt` | 14.7 KB each |

Total: ~9.54 GB / 30 files. Full resume state for re-training.

## Step + lineage

- Resume from: Stage-2 ckpt at step 7335 (4B Qwen3-Base CPT pipeline)
- This ckpt: step 8706 (1371 steps of NCE training)
- Backbone: Qwen3-4B-Base (frozen during NCE phase)
- Training script: `scripts/train_unified.py`
- Config: `configs/large_scale/qwen3_4b_stage3_nce_resume7335_beam_perround_6n.yaml`

## Eval (Stage-3 BoN, K=4, R=1, α=0.5, beam_bayes argmax)

| benchmark | acc |
|---|---:|
| GSM8K (1319q) | 79.83% (1053/1319) |
| MATH-500 (500q) | 48.00% (240/500) |
| HumanEval (164q) | 56.10% (92/164) |
| GPQA-diamond (193q) | 36.27% (70/193) |

Compared to Stage-2 baseline at step 7335 (greedy uncommitted_soft):
| | Stage-3 NCE | Stage-2 baseline | Δ |
|---|---:|---:|---:|
| GSM8K | 79.83% | 82.64% | −2.81pp |
| MATH-500 | 48.00% | 51.00% | −3.00pp |
| HumanEval | 56.10% | 60.98% | −4.88pp |
| GPQA | 36.27% | 36.27% | 0 |

The scorer-rerank is currently neutral-to-slightly-negative — see notes
on `score_scale` plateau in the project README. A learnable-scale
re-train (`-b2fix` variant) is queued.

## Loading

```python
from draft_refine.training.checkpointing import load_full_state
ckpt = load_full_state("./checkpoint-00008706-20260501_081812/")
# ckpt.model contains the DiffusionLM with scorer head attached
```

## Related archives

- `haotiansun014/qwen3-4b-stage3-nce-7335-lastround-archive` — same training
  recipe but with `combine=lastround` (final-round score, not per-round).
- `haotiansun014/qwen3-4b-stage3-nce-7335-temp07-archive` — softmax_sampling
  proposal at T=0.7 (vs argmax in beam-perround).