| --- |
| license: apache-2.0 |
| base_model: EleutherAI/pythia-410m |
| library_name: pytorch |
| tags: |
| - sparse-autoencoder |
| - sae |
| - interpretability |
| - pythia |
| --- |
| |
| # pythia-410m-saes-x32-l1-3e-4-fixed β Sparse Autoencoders on Pythia-410M (run_exp_2_t1) |
| |
|  |
| |
| Sparse Autoencoder (SAE) checkpoints trained on every residual-stream layer of |
| `EleutherAI/pythia-410m`, for the COLM SAE scaling-law experiments |
| ([source code on GitHub](https://github.com/nileshsarkar-ai/Erdos-AI-Labs), |
| [full codebase on HF](https://huggingface.co/nileshsarkar-ai/sae-encoders)). |
| |
| | | | |
| |---|---| |
| |  |  | |
| | Training curves across all 24 layers | Predicted vs measured loss floor | |
| |
| ## Contents |
| |
| | | | |
| |---|---| |
| | Base model | [`EleutherAI/pythia-410m`](https://huggingface.co/EleutherAI/pythia-410m) | |
| | Layers covered | 0β23 (all 24) | |
| | SAE expansion factor | **32** β `F = 32,768` dictionary features per layer | |
| | Hidden dim being modeled | 1024 (Pythia-410M residual stream) | |
| | L1 coefficient | `3e-4` (fixed) | |
| | Tokens trained | 300 M (PILE) | |
| | Snapshots per layer | 6 β at 50 M, 100 M, 150 M, 200 M, 250 M tokens, plus `final` | |
| | Total files | **144** `.pt` checkpoints (24 layers Γ 6 snapshots) | |
| |
| ## File naming |
| |
| ``` |
| sae_layer{LL}_{SNAPSHOT}.pt |
| ``` |
| |
| Where `LL` is the layer index (`00`β`23`) and `SNAPSHOT` is one of |
| `50M, 100M, 150M, 200M, 250M, final`. |
| |
| Examples: |
| - `sae_layer00_50M.pt` |
| - `sae_layer12_final.pt` |
| - `sae_layer23_250M.pt` |
| |
| ## Loading |
| |
| ```python |
| import torch |
| from huggingface_hub import hf_hub_download |
|
|
| ckpt_path = hf_hub_download( |
| repo_id="nileshsarkar-ai/pythia-410m-saes-x32-l1-3e-4-fixed", |
| filename="sae_layer12_final.pt", |
| ) |
| state = torch.load(ckpt_path, map_location="cpu", weights_only=True) |
| ``` |
| |
| ## Sister runs (same setup, different L1 coefficient) |
|
|
| | run | L1 coefficient | target | |
| |---|---|---| |
| | [pythia-410m-saes-x32-l1-adaptive](https://huggingface.co/nileshsarkar-ai/pythia-410m-saes-x32-l1-adaptive) | `5e-4` (adaptive) | target `L0 β 150` | |
| | **[pythia-410m-saes-x32-l1-3e-4-fixed](https://huggingface.co/nileshsarkar-ai/pythia-410m-saes-x32-l1-3e-4-fixed)** | `3e-4` | fixed | |
| | [pythia-410m-saes-x32-l1-8e-5-fixed](https://huggingface.co/nileshsarkar-ai/pythia-410m-saes-x32-l1-8e-5-fixed) | `8e-5` | fixed | |
|
|
| ## Reproducing |
|
|
| Training script at |
| [`run_exp_2_t1/run_exp.py`](https://github.com/nileshsarkar-ai/Erdos-AI-Labs/blob/master/run_exp_2_t1/run_exp.py) |
| in the source repo. Hardware: NVIDIA A100 80 GB PCIe. |
|
|
| ```bash |
| python run_exp.py --phase train --num_tokens 300_000_000 --expansion 32 --l1_coeff 3e-4 |
| ``` |
|
|
| ## Related artifacts |
|
|
| - Per-layer results and heatmaps on GitHub: |
| [`run_exp_2_t1/results/`](https://github.com/nileshsarkar-ai/Erdos-AI-Labs/tree/master/run_exp_2_t1/results). |
| - Backup-restore doc: [`COLM_BACKUP_RESTORE.md`](https://github.com/nileshsarkar-ai/Erdos-AI-Labs/blob/master/COLM_BACKUP_RESTORE.md). |
|
|