--- license: apache-2.0 base_model: EleutherAI/pythia-410m library_name: pytorch tags: - sparse-autoencoder - sae - interpretability - pythia --- # pythia-410m-saes-x32-l1-3e-4-fixed — Sparse Autoencoders on Pythia-410M (run_exp_2_t1) ![Per-layer metrics heatmap](https://huggingface.co/nileshsarkar-ai/sae-encoders/resolve/main/run_exp_2_t1/plots/2_metrics_heatmap.png) Sparse Autoencoder (SAE) checkpoints trained on every residual-stream layer of `EleutherAI/pythia-410m`, for the COLM SAE scaling-law experiments ([source code on GitHub](https://github.com/nileshsarkar-ai/Erdos-AI-Labs), [full codebase on HF](https://huggingface.co/nileshsarkar-ai/sae-encoders)). | | | |---|---| | ![Training curves](https://huggingface.co/nileshsarkar-ai/sae-encoders/resolve/main/run_exp_2_t1/plots/1_training_curves.png) | ![Loss-floor predictions](https://huggingface.co/nileshsarkar-ai/sae-encoders/resolve/main/run_exp_2_t1/plots/8_loss_floor_predictions.png) | | Training curves across all 24 layers | Predicted vs measured loss floor | ## Contents | | | |---|---| | Base model | [`EleutherAI/pythia-410m`](https://huggingface.co/EleutherAI/pythia-410m) | | Layers covered | 0–23 (all 24) | | SAE expansion factor | **32** → `F = 32,768` dictionary features per layer | | Hidden dim being modeled | 1024 (Pythia-410M residual stream) | | L1 coefficient | `3e-4` (fixed) | | Tokens trained | 300 M (PILE) | | Snapshots per layer | 6 — at 50 M, 100 M, 150 M, 200 M, 250 M tokens, plus `final` | | Total files | **144** `.pt` checkpoints (24 layers × 6 snapshots) | ## File naming ``` sae_layer{LL}_{SNAPSHOT}.pt ``` Where `LL` is the layer index (`00`–`23`) and `SNAPSHOT` is one of `50M, 100M, 150M, 200M, 250M, final`. Examples: - `sae_layer00_50M.pt` - `sae_layer12_final.pt` - `sae_layer23_250M.pt` ## Loading ```python import torch from huggingface_hub import hf_hub_download ckpt_path = hf_hub_download( repo_id="nileshsarkar-ai/pythia-410m-saes-x32-l1-3e-4-fixed", filename="sae_layer12_final.pt", ) state = torch.load(ckpt_path, map_location="cpu", weights_only=True) ``` ## Sister runs (same setup, different L1 coefficient) | run | L1 coefficient | target | |---|---|---| | [pythia-410m-saes-x32-l1-adaptive](https://huggingface.co/nileshsarkar-ai/pythia-410m-saes-x32-l1-adaptive) | `5e-4` (adaptive) | target `L0 ≈ 150` | | **[pythia-410m-saes-x32-l1-3e-4-fixed](https://huggingface.co/nileshsarkar-ai/pythia-410m-saes-x32-l1-3e-4-fixed)** | `3e-4` | fixed | | [pythia-410m-saes-x32-l1-8e-5-fixed](https://huggingface.co/nileshsarkar-ai/pythia-410m-saes-x32-l1-8e-5-fixed) | `8e-5` | fixed | ## Reproducing Training script at [`run_exp_2_t1/run_exp.py`](https://github.com/nileshsarkar-ai/Erdos-AI-Labs/blob/master/run_exp_2_t1/run_exp.py) in the source repo. Hardware: NVIDIA A100 80 GB PCIe. ```bash python run_exp.py --phase train --num_tokens 300_000_000 --expansion 32 --l1_coeff 3e-4 ``` ## Related artifacts - Per-layer results and heatmaps on GitHub: [`run_exp_2_t1/results/`](https://github.com/nileshsarkar-ai/Erdos-AI-Labs/tree/master/run_exp_2_t1/results). - Backup-restore doc: [`COLM_BACKUP_RESTORE.md`](https://github.com/nileshsarkar-ai/Erdos-AI-Labs/blob/master/COLM_BACKUP_RESTORE.md).