Qwen3.5-35B-A3B TopK SAEs β€” Phase 2 (Replication of Venhoff et al. 2025)

TopK Sparse Autoencoders trained on L17 residual-stream activations of Qwen/Qwen3.5-35B-A3B thinking traces.

Recipe (Venhoff et al. 2025, arXiv:2510.07364)

  • TopK activation, k=3
  • Decoder row-normalized after each step
  • MSE loss (sparsity via TopK, no L1 penalty)
  • TinySAE lr schedule: 2e-4 / sqrt(n/16384)
  • Adam, batch 512, max 300 epochs, patience 10, 90/10 train/val split
  • Activation source: sentence-level mean-pool (41285 sentences from 2000 MMLU-Pro prompts)

Dict sizes swept

n val MSE var explained cos sim
5 0.0007 0.066 0.870
10 0.0007 0.090 0.877
15 0.0007 0.110 0.882
20 0.0007 0.123 0.886
25 0.0007 0.132 0.888

Selected (elbow): n=15

Files

  • sae_n{5,10,15,20,25}.pt β€” all trained SAEs
  • cluster_data_n15.json β€” per-cluster top-100 + random-100 sentences (for LLM labeling in Phase 3)
  • summary.json β€” training metrics per dict size
  • sweep.png β€” MSE vs var-explained curves

Next (Phase 3)

Label clusters with GPT-4o-mini using top-100 activating sentences β†’ 10-20 named reasoning categories. Then train per-category steering vectors (Phase 4) and run hybrid inference on MATH500/GSM8K (Phase 5).

Paper tracking

Full replication pipeline: https://huggingface.co/datasets/caiovicentino1/Qwen3.6-35B-A3B-mcr-stage-b

License: MIT.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for caiovicentino1/qwen35-a3b-sae-phase2

Finetuned
(127)
this model

Paper for caiovicentino1/qwen35-a3b-sae-phase2