nileshsarkar-ai
/

pythia-410m-saes-x32-l1-3e-4-fixed

sparse-autoencoder

interpretability

Model card Files Files and versions

pythia-410m-saes-x32-l1-3e-4-fixed / README.md

nileshsarkar-ai's picture

nileshsarkar-ai

Use model-spec name; update sister-run links

a795119 verified about 1 month ago

|

history blame contribute delete

3.31 kB

	---
	license: apache-2.0
	base_model: EleutherAI/pythia-410m
	library_name: pytorch
	tags:
	- sparse-autoencoder
	- sae
	- interpretability
	- pythia
	---

	# pythia-410m-saes-x32-l1-3e-4-fixed — Sparse Autoencoders on Pythia-410M (run_exp_2_t1)

	![Per-layer metrics heatmap](https://huggingface.co/nileshsarkar-ai/sae-encoders/resolve/main/run_exp_2_t1/plots/2_metrics_heatmap.png)

	Sparse Autoencoder (SAE) checkpoints trained on every residual-stream layer of
	`EleutherAI/pythia-410m`, for the COLM SAE scaling-law experiments
	([source code on GitHub](https://github.com/nileshsarkar-ai/Erdos-AI-Labs),
	[full codebase on HF](https://huggingface.co/nileshsarkar-ai/sae-encoders)).

	\| \| \|
	\|---\|---\|
	\| ![Training curves](https://huggingface.co/nileshsarkar-ai/sae-encoders/resolve/main/run_exp_2_t1/plots/1_training_curves.png) \| ![Loss-floor predictions](https://huggingface.co/nileshsarkar-ai/sae-encoders/resolve/main/run_exp_2_t1/plots/8_loss_floor_predictions.png) \|
	\| Training curves across all 24 layers \| Predicted vs measured loss floor \|

	## Contents

	\| \| \|
	\|---\|---\|
	\| Base model \| [`EleutherAI/pythia-410m`](https://huggingface.co/EleutherAI/pythia-410m) \|
	\| Layers covered \| 0–23 (all 24) \|
	\| SAE expansion factor \| 32 → `F = 32,768` dictionary features per layer \|
	\| Hidden dim being modeled \| 1024 (Pythia-410M residual stream) \|
	\| L1 coefficient \| `3e-4` (fixed) \|
	\| Tokens trained \| 300 M (PILE) \|
	\| Snapshots per layer \| 6 — at 50 M, 100 M, 150 M, 200 M, 250 M tokens, plus `final` \|
	\| Total files \| 144 `.pt` checkpoints (24 layers × 6 snapshots) \|

	## File naming

	```
	sae_layer{LL}_{SNAPSHOT}.pt
	```

	Where `LL` is the layer index (`00`–`23`) and `SNAPSHOT` is one of
	`50M, 100M, 150M, 200M, 250M, final`.

	Examples:
	- `sae_layer00_50M.pt`
	- `sae_layer12_final.pt`
	- `sae_layer23_250M.pt`

	## Loading

	```python
	import torch
	from huggingface_hub import hf_hub_download

	ckpt_path = hf_hub_download(
	repo_id="nileshsarkar-ai/pythia-410m-saes-x32-l1-3e-4-fixed",
	filename="sae_layer12_final.pt",
	)
	state = torch.load(ckpt_path, map_location="cpu", weights_only=True)
	```

	## Sister runs (same setup, different L1 coefficient)

	\| run \| L1 coefficient \| target \|
	\|---\|---\|---\|
	\| [pythia-410m-saes-x32-l1-adaptive](https://huggingface.co/nileshsarkar-ai/pythia-410m-saes-x32-l1-adaptive) \| `5e-4` (adaptive) \| target `L0 ≈ 150` \|
	\| [pythia-410m-saes-x32-l1-3e-4-fixed](https://huggingface.co/nileshsarkar-ai/pythia-410m-saes-x32-l1-3e-4-fixed) \| `3e-4` \| fixed \|
	\| [pythia-410m-saes-x32-l1-8e-5-fixed](https://huggingface.co/nileshsarkar-ai/pythia-410m-saes-x32-l1-8e-5-fixed) \| `8e-5` \| fixed \|

	## Reproducing

	Training script at
	[`run_exp_2_t1/run_exp.py`](https://github.com/nileshsarkar-ai/Erdos-AI-Labs/blob/master/run_exp_2_t1/run_exp.py)
	in the source repo. Hardware: NVIDIA A100 80 GB PCIe.

	```bash
	python run_exp.py --phase train --num_tokens 300_000_000 --expansion 32 --l1_coeff 3e-4
	```

	## Related artifacts

	- Per-layer results and heatmaps on GitHub:
	[`run_exp_2_t1/results/`](https://github.com/nileshsarkar-ai/Erdos-AI-Labs/tree/master/run_exp_2_t1/results).
	- Backup-restore doc: [`COLM_BACKUP_RESTORE.md`](https://github.com/nileshsarkar-ai/Erdos-AI-Labs/blob/master/COLM_BACKUP_RESTORE.md).