haotiansun014
/

qwen3-4b-stage3-nce-7335-perround-archive

block-diffusion

Model card Files Files and versions

haotiansun014 commited on May 2

Commit

2f442c2

·

verified ·

1 Parent(s): b79c30e

Add README

Files changed (1) hide show

README.md +72 -0

README.md ADDED Viewed

	@@ -0,0 +1,72 @@

+---
+license: apache-2.0
+language: en
+tags:
+  - draft-refine
+  - block-diffusion
+  - nce
+  - qwen3-4b
+---
+# Qwen3-4B Stage-3 NCE — beam-perround variant
+Stage-3 Noise-Contrastive-Estimation training resumed from the Stage-2
+end ckpt at step 7335. NCE phase trains the scorer head to rank K=4
+candidate completions per block, with beam-bayes proposal sampling and
+per-round score combination.
+## Files
+| File | Size |
+|---|---|
+| `model.pt` | 8.5 GB |
+| `optimizer.pt` | 0.45 GB |
+| `scheduler.pt` | 1.7 KB |
+| `eval_batches.pt` | 13 MB |
+| `rng_rank{0..23}.pt` | 14.7 KB each |
+Total: ~9.54 GB / 30 files. Full resume state for re-training.
+## Step + lineage
+- Resume from: Stage-2 ckpt at step 7335 (4B Qwen3-Base CPT pipeline)
+- This ckpt: step 8706 (1371 steps of NCE training)
+- Backbone: Qwen3-4B-Base (frozen during NCE phase)
+- Training script: `scripts/train_unified.py`
+- Config: `configs/large_scale/qwen3_4b_stage3_nce_resume7335_beam_perround_6n.yaml`
+## Eval (Stage-3 BoN, K=4, R=1, α=0.5, beam_bayes argmax)
+| benchmark | acc |
+|---|---:|
+| GSM8K (1319q) | 79.83% (1053/1319) |
+| MATH-500 (500q) | 48.00% (240/500) |
+| HumanEval (164q) | 56.10% (92/164) |
+| GPQA-diamond (193q) | 36.27% (70/193) |
+Compared to Stage-2 baseline at step 7335 (greedy uncommitted_soft):
+| | Stage-3 NCE | Stage-2 baseline | Δ |
+|---|---:|---:|---:|
+| GSM8K | 79.83% | 82.64% | −2.81pp |
+| MATH-500 | 48.00% | 51.00% | −3.00pp |
+| HumanEval | 56.10% | 60.98% | −4.88pp |
+| GPQA | 36.27% | 36.27% | 0 |
+The scorer-rerank is currently neutral-to-slightly-negative — see notes
+on `score_scale` plateau in the project README. A learnable-scale
+re-train (`-b2fix` variant) is queued.
+## Loading
+```python
+from draft_refine.training.checkpointing import load_full_state
+ckpt = load_full_state("./checkpoint-00008706-20260501_081812/")
+# ckpt.model contains the DiffusionLM with scorer head attached
+```
+## Related archives
+- `haotiansun014/qwen3-4b-stage3-nce-7335-lastround-archive` — same training
+  recipe but with `combine=lastround` (final-round score, not per-round).
+- `haotiansun014/qwen3-4b-stage3-nce-7335-temp07-archive` — softmax_sampling
+  proposal at T=0.7 (vs argmax in beam-perround).