gemma-4-e2b-scope-v1-L17-batchtopk-k64-seed17

BatchTopK Sparse Autoencoder trained on residual-stream activations from Gemma 4 E2B at layer 17 (relative depth ≈ 49 %), on FineWeb-Edu (pretraining-distribution) text under bitsandbytes 4-bit NF4 quantization.

Training progress — Checkpoint 2 (2026-04-30)

Field	Value
Tokens seen	16,001,024
Target total	100,000,000
Progress	16.0 %
Training steps	15,626
Last checkpoint	2026-04-30T10:38:59 UTC

Training metrics (Checkpoint 2)

Metric	Checkpoint 1 (~8M tok)	Checkpoint 2 (16M tok)
Loss	~0.654	0.586
Explained variance	~0.770	0.831
Peak EV (Ckpt 2)	—	0.849 @ step ~15,350
L0	64	64
Alive features (frac)	~62 %	~62 %

Training is ongoing — weights update with each checkpoint push.

Hyperparameters


Architecture	BatchTopK (Bussmann et al. arXiv:2412.06410)
d_in	1536
d_sae	24576 (16× expansion)
k	64
Seed	17
Layer	17
Base model	google/gemma-4-E2B
Quantization	bitsandbytes NF4, fp16 compute
Optimizer	Adam, lr=3e-4
Batch size	1024 activations
Dataset	HuggingFaceFW/fineweb-edu (sample-10BT), streaming, seed=17
Aux-k coefficient	0.0625
Decoder norm	Unit-norm per Gemma Scope recipe

Usage

Load weights with SAELens-compatible state-dict keys:

import torch, json
from huggingface_hub import hf_hub_download

repo = "Solshine/gemma-4-e2b-scope-v1-L17-batchtopk-k64-seed17"
cfg = json.loads(open(hf_hub_download(repo, "cfg.json")).read())
state = torch.load(hf_hub_download(repo, "sae_weights.pt"), map_location="cpu", weights_only=True)
# Keys: W_enc [d_in, d_sae], W_dec [d_sae, d_in], b_enc [d_sae], b_dec [d_in]

Hook into Gemma 4 E2B at layer 17 to collect residual-stream activations, then encode with the SAE. Per-example TopK (not batch-level) for inference.

Research context

This SAE is part of an ongoing deception-interpretability research program examining whether behavioral distinctions (honest vs. deceptive model outputs) leave recoverable traces in SAE feature space. Training on the pretraining distribution (FineWeb-Edu) establishes a general-purpose feature vocabulary for Gemma 4 E2B; subsequent experiments probe this vocabulary against decision-incentive behavioral scenarios.

Live W&B run: https://wandb.ai/caleb-deleeuw/gemma4-sae-scope/runs/gemma-4-e2b-scope-v1-L17-batchtopk-k64-seed17

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Solshine/gemma-4-e2b-scope-v1-L17-batchtopk-k64-seed17

Base model

google/gemma-4-E2B

Finetuned

(62)

this model

Dataset used to train Solshine/gemma-4-e2b-scope-v1-L17-batchtopk-k64-seed17

Paper for Solshine/gemma-4-e2b-scope-v1-L17-batchtopk-k64-seed17

BatchTopK Sparse Autoencoders

Paper • 2412.06410 • Published Dec 9, 2024