---
license: apache-2.0
base_model: EleutherAI/pythia-410m
library_name: pytorch
tags:
  - sparse-autoencoder
  - sae
  - interpretability
  - pythia
---

# pythia-410m-saes-x32-l1-3e-4-fixed — Sparse Autoencoders on Pythia-410M (run_exp_2_t1)

![Per-layer metrics heatmap](https://huggingface.co/nileshsarkar-ai/sae-encoders/resolve/main/run_exp_2_t1/plots/2_metrics_heatmap.png)

Sparse Autoencoder (SAE) checkpoints trained on every residual-stream layer of
`EleutherAI/pythia-410m`, for the COLM SAE scaling-law experiments
([source code on GitHub](https://github.com/nileshsarkar-ai/Erdos-AI-Labs),
[full codebase on HF](https://huggingface.co/nileshsarkar-ai/sae-encoders)).

| | |
|---|---|
| ![Training curves](https://huggingface.co/nileshsarkar-ai/sae-encoders/resolve/main/run_exp_2_t1/plots/1_training_curves.png) | ![Loss-floor predictions](https://huggingface.co/nileshsarkar-ai/sae-encoders/resolve/main/run_exp_2_t1/plots/8_loss_floor_predictions.png) |
| Training curves across all 24 layers | Predicted vs measured loss floor |

## Contents

| | |
|---|---|
| Base model | [`EleutherAI/pythia-410m`](https://huggingface.co/EleutherAI/pythia-410m) |
| Layers covered | 0–23 (all 24) |
| SAE expansion factor | **32** → `F = 32,768` dictionary features per layer |
| Hidden dim being modeled | 1024 (Pythia-410M residual stream) |
| L1 coefficient | `3e-4` (fixed) |
| Tokens trained | 300 M (PILE) |
| Snapshots per layer | 6 — at 50 M, 100 M, 150 M, 200 M, 250 M tokens, plus `final` |
| Total files | **144** `.pt` checkpoints (24 layers × 6 snapshots) |

## File naming

```
sae_layer{LL}_{SNAPSHOT}.pt
```

Where `LL` is the layer index (`00`–`23`) and `SNAPSHOT` is one of
`50M, 100M, 150M, 200M, 250M, final`.

Examples:
- `sae_layer00_50M.pt`
- `sae_layer12_final.pt`
- `sae_layer23_250M.pt`

## Loading

```python
import torch
from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="nileshsarkar-ai/pythia-410m-saes-x32-l1-3e-4-fixed",
    filename="sae_layer12_final.pt",
)
state = torch.load(ckpt_path, map_location="cpu", weights_only=True)
```

## Sister runs (same setup, different L1 coefficient)

| run | L1 coefficient | target |
|---|---|---|
| [pythia-410m-saes-x32-l1-adaptive](https://huggingface.co/nileshsarkar-ai/pythia-410m-saes-x32-l1-adaptive)  | `5e-4` (adaptive) | target `L0 ≈ 150` |
| **[pythia-410m-saes-x32-l1-3e-4-fixed](https://huggingface.co/nileshsarkar-ai/pythia-410m-saes-x32-l1-3e-4-fixed)** | `3e-4`            | fixed              |
| [pythia-410m-saes-x32-l1-8e-5-fixed](https://huggingface.co/nileshsarkar-ai/pythia-410m-saes-x32-l1-8e-5-fixed)  | `8e-5`            | fixed              |

## Reproducing

Training script at
[`run_exp_2_t1/run_exp.py`](https://github.com/nileshsarkar-ai/Erdos-AI-Labs/blob/master/run_exp_2_t1/run_exp.py)
in the source repo. Hardware: NVIDIA A100 80 GB PCIe.

```bash
python run_exp.py --phase train --num_tokens 300_000_000 --expansion 32 --l1_coeff 3e-4
```

## Related artifacts

- Per-layer results and heatmaps on GitHub:
  [`run_exp_2_t1/results/`](https://github.com/nileshsarkar-ai/Erdos-AI-Labs/tree/master/run_exp_2_t1/results).
- Backup-restore doc: [`COLM_BACKUP_RESTORE.md`](https://github.com/nileshsarkar-ai/Erdos-AI-Labs/blob/master/COLM_BACKUP_RESTORE.md).