gemma-4-e2b-scope-v1-L17-batchtopk-k64-seed17

BatchTopK Sparse Autoencoder trained on residual-stream activations from Gemma 4 E2B at layer 17 (relative depth β‰ˆ 49 %), on FineWeb-Edu (pretraining-distribution) text under bitsandbytes 4-bit NF4 quantization.

Training progress β€” Checkpoint 2 (2026-04-30)

Field Value
Tokens seen 16,001,024
Target total 100,000,000
Progress 16.0 %
Training steps 15,626
Last checkpoint 2026-04-30T10:38:59 UTC

Training metrics (Checkpoint 2)

Metric Checkpoint 1 (~8M tok) Checkpoint 2 (16M tok)
Loss ~0.654 0.586
Explained variance ~0.770 0.831
Peak EV (Ckpt 2) β€” 0.849 @ step ~15,350
L0 64 64
Alive features (frac) ~62 % ~62 %

Training is ongoing β€” weights update with each checkpoint push.

Hyperparameters

Architecture BatchTopK (Bussmann et al. arXiv:2412.06410)
d_in 1536
d_sae 24576 (16Γ— expansion)
k 64
Seed 17
Layer 17
Base model google/gemma-4-E2B
Quantization bitsandbytes NF4, fp16 compute
Optimizer Adam, lr=3e-4
Batch size 1024 activations
Dataset HuggingFaceFW/fineweb-edu (sample-10BT), streaming, seed=17
Aux-k coefficient 0.0625
Decoder norm Unit-norm per Gemma Scope recipe

Usage

Load weights with SAELens-compatible state-dict keys:

import torch, json
from huggingface_hub import hf_hub_download

repo = "Solshine/gemma-4-e2b-scope-v1-L17-batchtopk-k64-seed17"
cfg = json.loads(open(hf_hub_download(repo, "cfg.json")).read())
state = torch.load(hf_hub_download(repo, "sae_weights.pt"), map_location="cpu", weights_only=True)
# Keys: W_enc [d_in, d_sae], W_dec [d_sae, d_in], b_enc [d_sae], b_dec [d_in]

Hook into Gemma 4 E2B at layer 17 to collect residual-stream activations, then encode with the SAE. Per-example TopK (not batch-level) for inference.

Research context

This SAE is part of an ongoing deception-interpretability research program examining whether behavioral distinctions (honest vs. deceptive model outputs) leave recoverable traces in SAE feature space. Training on the pretraining distribution (FineWeb-Edu) establishes a general-purpose feature vocabulary for Gemma 4 E2B; subsequent experiments probe this vocabulary against decision-incentive behavioral scenarios.

Live W&B run: https://wandb.ai/caleb-deleeuw/gemma4-sae-scope/runs/gemma-4-e2b-scope-v1-L17-batchtopk-k64-seed17

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Solshine/gemma-4-e2b-scope-v1-L17-batchtopk-k64-seed17

Finetuned
(62)
this model

Dataset used to train Solshine/gemma-4-e2b-scope-v1-L17-batchtopk-k64-seed17

Paper for Solshine/gemma-4-e2b-scope-v1-L17-batchtopk-k64-seed17