nileshsarkar-ai's picture
Use model-spec name; update sister-run links
a795119 verified
metadata
license: apache-2.0
base_model: EleutherAI/pythia-410m
library_name: pytorch
tags:
  - sparse-autoencoder
  - sae
  - interpretability
  - pythia

pythia-410m-saes-x32-l1-3e-4-fixed — Sparse Autoencoders on Pythia-410M (run_exp_2_t1)

Per-layer metrics heatmap

Sparse Autoencoder (SAE) checkpoints trained on every residual-stream layer of EleutherAI/pythia-410m, for the COLM SAE scaling-law experiments (source code on GitHub, full codebase on HF).

Training curves Loss-floor predictions
Training curves across all 24 layers Predicted vs measured loss floor

Contents

Base model EleutherAI/pythia-410m
Layers covered 0–23 (all 24)
SAE expansion factor 32F = 32,768 dictionary features per layer
Hidden dim being modeled 1024 (Pythia-410M residual stream)
L1 coefficient 3e-4 (fixed)
Tokens trained 300 M (PILE)
Snapshots per layer 6 — at 50 M, 100 M, 150 M, 200 M, 250 M tokens, plus final
Total files 144 .pt checkpoints (24 layers × 6 snapshots)

File naming

sae_layer{LL}_{SNAPSHOT}.pt

Where LL is the layer index (0023) and SNAPSHOT is one of 50M, 100M, 150M, 200M, 250M, final.

Examples:

  • sae_layer00_50M.pt
  • sae_layer12_final.pt
  • sae_layer23_250M.pt

Loading

import torch
from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="nileshsarkar-ai/pythia-410m-saes-x32-l1-3e-4-fixed",
    filename="sae_layer12_final.pt",
)
state = torch.load(ckpt_path, map_location="cpu", weights_only=True)

Sister runs (same setup, different L1 coefficient)

run L1 coefficient target
pythia-410m-saes-x32-l1-adaptive 5e-4 (adaptive) target L0 ≈ 150
pythia-410m-saes-x32-l1-3e-4-fixed 3e-4 fixed
pythia-410m-saes-x32-l1-8e-5-fixed 8e-5 fixed

Reproducing

Training script at run_exp_2_t1/run_exp.py in the source repo. Hardware: NVIDIA A100 80 GB PCIe.

python run_exp.py --phase train --num_tokens 300_000_000 --expansion 32 --l1_coeff 3e-4

Related artifacts