---
license: apache-2.0
tags:
- sparse-autoencoder
- mechanistic-interpretability
- sae-lens
- gemma-4
- batch-topk
base_model: google/gemma-4-E2B
datasets:
- HuggingFaceFW/fineweb-edu
language:
- en
---

# gemma-4-e2b-scope-v1-L17-batchtopk-k64-seed17

BatchTopK Sparse Autoencoder trained on residual-stream activations from
**Gemma 4 E2B** at **layer 17** (relative depth ≈ 49 %), on FineWeb-Edu
(pretraining-distribution) text under bitsandbytes 4-bit NF4 quantization.

## Training progress — Checkpoint 2 (2026-04-30)

| Field | Value |
|---|---|
| Tokens seen | 16,001,024 |
| Target total | 100,000,000 |
| Progress | 16.0 % |
| Training steps | 15,626 |
| Last checkpoint | 2026-04-30T10:38:59 UTC |

## Training metrics (Checkpoint 2)

| Metric | Checkpoint 1 (~8M tok) | Checkpoint 2 (16M tok) |
|---|---|---|
| Loss | ~0.654 | **0.586** |
| Explained variance | ~0.770 | **0.831** |
| Peak EV (Ckpt 2) | — | 0.849 @ step ~15,350 |
| L0 | 64 | 64 |
| Alive features (frac) | ~62 % | ~62 % |

Training is ongoing — weights update with each checkpoint push.

## Hyperparameters

| | |
|---|---|
| Architecture | BatchTopK (Bussmann et al. arXiv:2412.06410) |
| d_in | 1536 |
| d_sae | 24576 (16× expansion) |
| k | 64 |
| Seed | 17 |
| Layer | 17 |
| Base model | google/gemma-4-E2B |
| Quantization | bitsandbytes NF4, fp16 compute |
| Optimizer | Adam, lr=3e-4 |
| Batch size | 1024 activations |
| Dataset | HuggingFaceFW/fineweb-edu (sample-10BT), streaming, seed=17 |
| Aux-k coefficient | 0.0625 |
| Decoder norm | Unit-norm per Gemma Scope recipe |

## Usage

Load weights with SAELens-compatible state-dict keys:

```python
import torch, json
from huggingface_hub import hf_hub_download

repo = "Solshine/gemma-4-e2b-scope-v1-L17-batchtopk-k64-seed17"
cfg = json.loads(open(hf_hub_download(repo, "cfg.json")).read())
state = torch.load(hf_hub_download(repo, "sae_weights.pt"), map_location="cpu", weights_only=True)
# Keys: W_enc [d_in, d_sae], W_dec [d_sae, d_in], b_enc [d_sae], b_dec [d_in]
```

Hook into Gemma 4 E2B at layer 17 to collect residual-stream activations,
then encode with the SAE. Per-example TopK (not batch-level) for inference.

## Research context

This SAE is part of an ongoing deception-interpretability research program
examining whether behavioral distinctions (honest vs. deceptive model outputs)
leave recoverable traces in SAE feature space. Training on the pretraining
distribution (FineWeb-Edu) establishes a general-purpose feature vocabulary
for Gemma 4 E2B; subsequent experiments probe this vocabulary against
decision-incentive behavioral scenarios.

Live W&B run: https://wandb.ai/caleb-deleeuw/gemma4-sae-scope/runs/gemma-4-e2b-scope-v1-L17-batchtopk-k64-seed17