| --- |
| license: apache-2.0 |
| language: en |
| tags: |
| - draft-refine |
| - block-diffusion |
| - nce |
| - qwen3-4b |
| --- |
| |
| # Qwen3-4B Stage-3 NCE — temp07-perround variant |
|
|
| Stage-3 Noise-Contrastive-Estimation training resumed from the Stage-2 |
| end ckpt at step 7335. NCE phase trains the scorer head to rank K=4 |
| candidate completions per block. Differs from beam-perround in proposal |
| sampling: this variant uses `softmax_sampling` at proposal_temperature |
| =0.7 (vs argmax in beam-perround) — closer to training-time scorer |
| noise. |
| |
| ## Files |
| |
| | File | Size | |
| |---|---| |
| | `model.pt` | 8.5 GB | |
| | `optimizer.pt` | 0.45 GB | |
| | `scheduler.pt` | 1.7 KB | |
| | `eval_batches.pt` | 13 MB | |
| | `rng_rank{0..23}.pt` | 14.7 KB each | |
|
|
| Total: ~9.54 GB / 30 files. Full resume state for re-training. |
|
|
| ## Step + lineage |
|
|
| - Resume from: Stage-2 ckpt at step 7335 |
| - This ckpt: step 9124 (1789 NCE-phase steps) |
| - Proposal: `combine=softmax_sampling`, `proposal_temperature=0.7` |
| - Config: `configs/large_scale/qwen3_4b_stage3_nce_resume7335_temp07_perround_6n.yaml` |
|
|
| ## Note on inference reproducibility |
|
|
| When evaluating this ckpt, use: |
| ``` |
| combine=softmax_sampling proposal_temperature=0.7 |
| ``` |
| to MATCH the training distribution. Using `combine=beam_bayes argmax` |
| (as for the perround/lastround variants) on this ckpt produces an |
| out-of-distribution scorer signal. |
|
|
| ## Related archives |
|
|
| - `haotiansun014/qwen3-4b-stage3-nce-7335-perround-archive` |
| - `haotiansun014/qwen3-4b-stage3-nce-7335-lastround-archive` |
|
|