gemma-4-e2b-scope-v1-L17-batchtopk-k64-seed17
BatchTopK Sparse Autoencoder trained on residual-stream activations from Gemma 4 E2B at layer 17 (relative depth β 49 %), on FineWeb-Edu (pretraining-distribution) text under bitsandbytes 4-bit NF4 quantization.
Training progress β Checkpoint 2 (2026-04-30)
| Field | Value |
|---|---|
| Tokens seen | 16,001,024 |
| Target total | 100,000,000 |
| Progress | 16.0 % |
| Training steps | 15,626 |
| Last checkpoint | 2026-04-30T10:38:59 UTC |
Training metrics (Checkpoint 2)
| Metric | Checkpoint 1 (~8M tok) | Checkpoint 2 (16M tok) |
|---|---|---|
| Loss | ~0.654 | 0.586 |
| Explained variance | ~0.770 | 0.831 |
| Peak EV (Ckpt 2) | β | 0.849 @ step ~15,350 |
| L0 | 64 | 64 |
| Alive features (frac) | ~62 % | ~62 % |
Training is ongoing β weights update with each checkpoint push.
Hyperparameters
| Architecture | BatchTopK (Bussmann et al. arXiv:2412.06410) |
| d_in | 1536 |
| d_sae | 24576 (16Γ expansion) |
| k | 64 |
| Seed | 17 |
| Layer | 17 |
| Base model | google/gemma-4-E2B |
| Quantization | bitsandbytes NF4, fp16 compute |
| Optimizer | Adam, lr=3e-4 |
| Batch size | 1024 activations |
| Dataset | HuggingFaceFW/fineweb-edu (sample-10BT), streaming, seed=17 |
| Aux-k coefficient | 0.0625 |
| Decoder norm | Unit-norm per Gemma Scope recipe |
Usage
Load weights with SAELens-compatible state-dict keys:
import torch, json
from huggingface_hub import hf_hub_download
repo = "Solshine/gemma-4-e2b-scope-v1-L17-batchtopk-k64-seed17"
cfg = json.loads(open(hf_hub_download(repo, "cfg.json")).read())
state = torch.load(hf_hub_download(repo, "sae_weights.pt"), map_location="cpu", weights_only=True)
# Keys: W_enc [d_in, d_sae], W_dec [d_sae, d_in], b_enc [d_sae], b_dec [d_in]
Hook into Gemma 4 E2B at layer 17 to collect residual-stream activations, then encode with the SAE. Per-example TopK (not batch-level) for inference.
Research context
This SAE is part of an ongoing deception-interpretability research program examining whether behavioral distinctions (honest vs. deceptive model outputs) leave recoverable traces in SAE feature space. Training on the pretraining distribution (FineWeb-Edu) establishes a general-purpose feature vocabulary for Gemma 4 E2B; subsequent experiments probe this vocabulary against decision-incentive behavioral scenarios.
Live W&B run: https://wandb.ai/caleb-deleeuw/gemma4-sae-scope/runs/gemma-4-e2b-scope-v1-L17-batchtopk-k64-seed17
Model tree for Solshine/gemma-4-e2b-scope-v1-L17-batchtopk-k64-seed17
Base model
google/gemma-4-E2B