metadata
license: apache-2.0
base_model: EleutherAI/pythia-410m
library_name: pytorch
tags:
- sparse-autoencoder
- sae
- interpretability
- pythia
pythia-410m-saes-x32-l1-3e-4-fixed — Sparse Autoencoders on Pythia-410M (run_exp_2_t1)
Sparse Autoencoder (SAE) checkpoints trained on every residual-stream layer of
EleutherAI/pythia-410m, for the COLM SAE scaling-law experiments
(source code on GitHub,
full codebase on HF).
Contents
| Base model | EleutherAI/pythia-410m |
| Layers covered | 0–23 (all 24) |
| SAE expansion factor | 32 → F = 32,768 dictionary features per layer |
| Hidden dim being modeled | 1024 (Pythia-410M residual stream) |
| L1 coefficient | 3e-4 (fixed) |
| Tokens trained | 300 M (PILE) |
| Snapshots per layer | 6 — at 50 M, 100 M, 150 M, 200 M, 250 M tokens, plus final |
| Total files | 144 .pt checkpoints (24 layers × 6 snapshots) |
File naming
sae_layer{LL}_{SNAPSHOT}.pt
Where LL is the layer index (00–23) and SNAPSHOT is one of
50M, 100M, 150M, 200M, 250M, final.
Examples:
sae_layer00_50M.ptsae_layer12_final.ptsae_layer23_250M.pt
Loading
import torch
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(
repo_id="nileshsarkar-ai/pythia-410m-saes-x32-l1-3e-4-fixed",
filename="sae_layer12_final.pt",
)
state = torch.load(ckpt_path, map_location="cpu", weights_only=True)
Sister runs (same setup, different L1 coefficient)
| run | L1 coefficient | target |
|---|---|---|
| pythia-410m-saes-x32-l1-adaptive | 5e-4 (adaptive) |
target L0 ≈ 150 |
| pythia-410m-saes-x32-l1-3e-4-fixed | 3e-4 |
fixed |
| pythia-410m-saes-x32-l1-8e-5-fixed | 8e-5 |
fixed |
Reproducing
Training script at
run_exp_2_t1/run_exp.py
in the source repo. Hardware: NVIDIA A100 80 GB PCIe.
python run_exp.py --phase train --num_tokens 300_000_000 --expansion 32 --l1_coeff 3e-4
Related artifacts
- Per-layer results and heatmaps on GitHub:
run_exp_2_t1/results/. - Backup-restore doc:
COLM_BACKUP_RESTORE.md.


