--- license: apache-2.0 base_model: EleutherAI/pythia-410m library_name: pytorch tags: - sparse-autoencoder - sae - interpretability - pythia --- # pythia-410m-saes-x32-l1-8e-5-fixed — Sparse Autoencoders on Pythia-410M (run_exp_2_t2) ![Premium heatmap](https://huggingface.co/nileshsarkar-ai/sae-encoders/resolve/main/run_exp_2_t2/results/SAE_Premium_Heatmap.png) Sparse Autoencoder (SAE) checkpoints trained on every residual-stream layer of `EleutherAI/pythia-410m`, for the COLM SAE scaling-law experiments ([source code on GitHub](https://github.com/nileshsarkar-ai/Erdos-AI-Labs), [full codebase on HF](https://huggingface.co/nileshsarkar-ai/sae-encoders)). | | | |---|---| | ![Professional heatmap](https://huggingface.co/nileshsarkar-ai/sae-encoders/resolve/main/run_exp_2_t2/results/SAE_Professional_Heatmap.png) | ![Layer analysis](https://huggingface.co/nileshsarkar-ai/sae-encoders/resolve/main/run_exp_2_t2/results/SAE_Layer_Analysis_Heatmap.png) | | Publication-style heatmap | Per-layer breakdown | | ![All-layers compact](https://huggingface.co/nileshsarkar-ai/sae-encoders/resolve/main/run_exp_2_t2/results/SAE_HeatmapVisual_All_Layers.png) | ![KPI summary](https://huggingface.co/nileshsarkar-ai/sae-encoders/resolve/main/run_exp_2_t2/results/SAE_KPI_Summary.png) | | All-layers compact view | KPI summary card | ## Contents | | | |---|---| | Base model | [`EleutherAI/pythia-410m`](https://huggingface.co/EleutherAI/pythia-410m) | | Layers covered | 0–23 (all 24) | | SAE expansion factor | **32** → `F = 32,768` dictionary features per layer | | Hidden dim being modeled | 1024 (Pythia-410M residual stream) | | L1 coefficient | `8e-5` (fixed; tuned for paper-exact sum formula) | | Tokens trained | 300 M (PILE) | | Snapshots per layer | 6 — at 50 M, 100 M, 150 M, 200 M, 250 M tokens, plus `final` | | Total files | **144** `.pt` checkpoints (24 layers × 6 snapshots) | ## File naming ``` sae_layer{LL}_{SNAPSHOT}.pt ``` Where `LL` is the layer index (`00`–`23`) and `SNAPSHOT` is one of `50M, 100M, 150M, 200M, 250M, final`. Examples: - `sae_layer00_50M.pt` - `sae_layer12_final.pt` - `sae_layer23_250M.pt` ## Loading ```python import torch from huggingface_hub import hf_hub_download ckpt_path = hf_hub_download( repo_id="nileshsarkar-ai/pythia-410m-saes-x32-l1-8e-5-fixed", filename="sae_layer12_final.pt", ) state = torch.load(ckpt_path, map_location="cpu", weights_only=True) ``` ## Sister runs (same setup, different L1 coefficient) | run | L1 coefficient | target | |---|---|---| | [pythia-410m-saes-x32-l1-adaptive](https://huggingface.co/nileshsarkar-ai/pythia-410m-saes-x32-l1-adaptive) | `5e-4` (adaptive) | target `L0 ≈ 150` | | [pythia-410m-saes-x32-l1-3e-4-fixed](https://huggingface.co/nileshsarkar-ai/pythia-410m-saes-x32-l1-3e-4-fixed) | `3e-4` | fixed | | **[pythia-410m-saes-x32-l1-8e-5-fixed](https://huggingface.co/nileshsarkar-ai/pythia-410m-saes-x32-l1-8e-5-fixed)** | `8e-5` | fixed | ## Reproducing Training script at [`run_exp_2_t2/run_exp2_t2.py`](https://github.com/nileshsarkar-ai/Erdos-AI-Labs/blob/master/run_exp_2_t2/run_exp2_t2.py) in the source repo. Hardware: NVIDIA A100 80 GB PCIe. ```bash python run_exp2_t2.py --phase train --num_tokens 300_000_000 --expansion 32 --l1_coeff 8e-5 ``` ## Related artifacts - Per-layer results and heatmaps on GitHub: [`run_exp_2_t2/results/`](https://github.com/nileshsarkar-ai/Erdos-AI-Labs/tree/master/run_exp_2_t2/results). - Backup-restore doc: [`COLM_BACKUP_RESTORE.md`](https://github.com/nileshsarkar-ai/Erdos-AI-Labs/blob/master/COLM_BACKUP_RESTORE.md).