Base Models Know How to Reason, Thinking Models Learn When
Paper β’ 2510.07364 β’ Published β’ 1
TopK Sparse Autoencoders trained on L17 residual-stream activations of Qwen/Qwen3.5-35B-A3B thinking traces.
2e-4 / sqrt(n/16384)| n | val MSE | var explained | cos sim |
|---|---|---|---|
| 5 | 0.0007 | 0.066 | 0.870 |
| 10 | 0.0007 | 0.090 | 0.877 |
| 15 | 0.0007 | 0.110 | 0.882 |
| 20 | 0.0007 | 0.123 | 0.886 |
| 25 | 0.0007 | 0.132 | 0.888 |
Selected (elbow): n=15
sae_n{5,10,15,20,25}.pt β all trained SAEscluster_data_n15.json β per-cluster top-100 + random-100 sentences (for LLM labeling in Phase 3)summary.json β training metrics per dict sizesweep.png β MSE vs var-explained curvesLabel clusters with GPT-4o-mini using top-100 activating sentences β 10-20 named reasoning categories. Then train per-category steering vectors (Phase 4) and run hybrid inference on MATH500/GSM8K (Phase 5).
Full replication pipeline: https://huggingface.co/datasets/caiovicentino1/Qwen3.6-35B-A3B-mcr-stage-b
License: MIT.