haotiansun014 commited on
Commit
2f442c2
·
verified ·
1 Parent(s): b79c30e

Add README

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language: en
4
+ tags:
5
+ - draft-refine
6
+ - block-diffusion
7
+ - nce
8
+ - qwen3-4b
9
+ ---
10
+
11
+ # Qwen3-4B Stage-3 NCE — beam-perround variant
12
+
13
+ Stage-3 Noise-Contrastive-Estimation training resumed from the Stage-2
14
+ end ckpt at step 7335. NCE phase trains the scorer head to rank K=4
15
+ candidate completions per block, with beam-bayes proposal sampling and
16
+ per-round score combination.
17
+
18
+ ## Files
19
+
20
+ | File | Size |
21
+ |---|---|
22
+ | `model.pt` | 8.5 GB |
23
+ | `optimizer.pt` | 0.45 GB |
24
+ | `scheduler.pt` | 1.7 KB |
25
+ | `eval_batches.pt` | 13 MB |
26
+ | `rng_rank{0..23}.pt` | 14.7 KB each |
27
+
28
+ Total: ~9.54 GB / 30 files. Full resume state for re-training.
29
+
30
+ ## Step + lineage
31
+
32
+ - Resume from: Stage-2 ckpt at step 7335 (4B Qwen3-Base CPT pipeline)
33
+ - This ckpt: step 8706 (1371 steps of NCE training)
34
+ - Backbone: Qwen3-4B-Base (frozen during NCE phase)
35
+ - Training script: `scripts/train_unified.py`
36
+ - Config: `configs/large_scale/qwen3_4b_stage3_nce_resume7335_beam_perround_6n.yaml`
37
+
38
+ ## Eval (Stage-3 BoN, K=4, R=1, α=0.5, beam_bayes argmax)
39
+
40
+ | benchmark | acc |
41
+ |---|---:|
42
+ | GSM8K (1319q) | 79.83% (1053/1319) |
43
+ | MATH-500 (500q) | 48.00% (240/500) |
44
+ | HumanEval (164q) | 56.10% (92/164) |
45
+ | GPQA-diamond (193q) | 36.27% (70/193) |
46
+
47
+ Compared to Stage-2 baseline at step 7335 (greedy uncommitted_soft):
48
+ | | Stage-3 NCE | Stage-2 baseline | Δ |
49
+ |---|---:|---:|---:|
50
+ | GSM8K | 79.83% | 82.64% | −2.81pp |
51
+ | MATH-500 | 48.00% | 51.00% | −3.00pp |
52
+ | HumanEval | 56.10% | 60.98% | −4.88pp |
53
+ | GPQA | 36.27% | 36.27% | 0 |
54
+
55
+ The scorer-rerank is currently neutral-to-slightly-negative — see notes
56
+ on `score_scale` plateau in the project README. A learnable-scale
57
+ re-train (`-b2fix` variant) is queued.
58
+
59
+ ## Loading
60
+
61
+ ```python
62
+ from draft_refine.training.checkpointing import load_full_state
63
+ ckpt = load_full_state("./checkpoint-00008706-20260501_081812/")
64
+ # ckpt.model contains the DiffusionLM with scorer head attached
65
+ ```
66
+
67
+ ## Related archives
68
+
69
+ - `haotiansun014/qwen3-4b-stage3-nce-7335-lastround-archive` — same training
70
+ recipe but with `combine=lastround` (final-round score, not per-round).
71
+ - `haotiansun014/qwen3-4b-stage3-nce-7335-temp07-archive` — softmax_sampling
72
+ proposal at T=0.7 (vs argmax in beam-perround).