nileshsarkar-ai's picture
Use model-spec name; update sister-run links
a795119 verified
|
raw
history blame contribute delete
3.31 kB
---
license: apache-2.0
base_model: EleutherAI/pythia-410m
library_name: pytorch
tags:
- sparse-autoencoder
- sae
- interpretability
- pythia
---
# pythia-410m-saes-x32-l1-3e-4-fixed β€” Sparse Autoencoders on Pythia-410M (run_exp_2_t1)
![Per-layer metrics heatmap](https://huggingface.co/nileshsarkar-ai/sae-encoders/resolve/main/run_exp_2_t1/plots/2_metrics_heatmap.png)
Sparse Autoencoder (SAE) checkpoints trained on every residual-stream layer of
`EleutherAI/pythia-410m`, for the COLM SAE scaling-law experiments
([source code on GitHub](https://github.com/nileshsarkar-ai/Erdos-AI-Labs),
[full codebase on HF](https://huggingface.co/nileshsarkar-ai/sae-encoders)).
| | |
|---|---|
| ![Training curves](https://huggingface.co/nileshsarkar-ai/sae-encoders/resolve/main/run_exp_2_t1/plots/1_training_curves.png) | ![Loss-floor predictions](https://huggingface.co/nileshsarkar-ai/sae-encoders/resolve/main/run_exp_2_t1/plots/8_loss_floor_predictions.png) |
| Training curves across all 24 layers | Predicted vs measured loss floor |
## Contents
| | |
|---|---|
| Base model | [`EleutherAI/pythia-410m`](https://huggingface.co/EleutherAI/pythia-410m) |
| Layers covered | 0–23 (all 24) |
| SAE expansion factor | **32** β†’ `F = 32,768` dictionary features per layer |
| Hidden dim being modeled | 1024 (Pythia-410M residual stream) |
| L1 coefficient | `3e-4` (fixed) |
| Tokens trained | 300 M (PILE) |
| Snapshots per layer | 6 β€” at 50 M, 100 M, 150 M, 200 M, 250 M tokens, plus `final` |
| Total files | **144** `.pt` checkpoints (24 layers Γ— 6 snapshots) |
## File naming
```
sae_layer{LL}_{SNAPSHOT}.pt
```
Where `LL` is the layer index (`00`–`23`) and `SNAPSHOT` is one of
`50M, 100M, 150M, 200M, 250M, final`.
Examples:
- `sae_layer00_50M.pt`
- `sae_layer12_final.pt`
- `sae_layer23_250M.pt`
## Loading
```python
import torch
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(
repo_id="nileshsarkar-ai/pythia-410m-saes-x32-l1-3e-4-fixed",
filename="sae_layer12_final.pt",
)
state = torch.load(ckpt_path, map_location="cpu", weights_only=True)
```
## Sister runs (same setup, different L1 coefficient)
| run | L1 coefficient | target |
|---|---|---|
| [pythia-410m-saes-x32-l1-adaptive](https://huggingface.co/nileshsarkar-ai/pythia-410m-saes-x32-l1-adaptive) | `5e-4` (adaptive) | target `L0 β‰ˆ 150` |
| **[pythia-410m-saes-x32-l1-3e-4-fixed](https://huggingface.co/nileshsarkar-ai/pythia-410m-saes-x32-l1-3e-4-fixed)** | `3e-4` | fixed |
| [pythia-410m-saes-x32-l1-8e-5-fixed](https://huggingface.co/nileshsarkar-ai/pythia-410m-saes-x32-l1-8e-5-fixed) | `8e-5` | fixed |
## Reproducing
Training script at
[`run_exp_2_t1/run_exp.py`](https://github.com/nileshsarkar-ai/Erdos-AI-Labs/blob/master/run_exp_2_t1/run_exp.py)
in the source repo. Hardware: NVIDIA A100 80 GB PCIe.
```bash
python run_exp.py --phase train --num_tokens 300_000_000 --expansion 32 --l1_coeff 3e-4
```
## Related artifacts
- Per-layer results and heatmaps on GitHub:
[`run_exp_2_t1/results/`](https://github.com/nileshsarkar-ai/Erdos-AI-Labs/tree/master/run_exp_2_t1/results).
- Backup-restore doc: [`COLM_BACKUP_RESTORE.md`](https://github.com/nileshsarkar-ai/Erdos-AI-Labs/blob/master/COLM_BACKUP_RESTORE.md).