biohub
/

ESMC-SAE-Overview

@@ -1,146 +1,155 @@
 ---
-license:
-- mit
 - other
 license_link: https://github.com/Biohub/esm/blob/main/THIRD_PARTY_NOTICE.md
-language:
-- en
-tags:
-- biology
-- esm
-- protein
-- sparse-autoencoder
-- interpretability
-- protein-embeddings
-- feature-extraction
-- protein-language-model
-- unsupervised-learning
 - transformers
 ---
-# ESMC Sparse Autoencoder (SAE) Explanation
-This model card provides an overview of the intended use of the ESMC SAE models and examples of how to access them, but it does not have a specific model or model weights. To access the individual SAE model cards for any of the models, use the format below:
-```py
-https://huggingface.co/Biohub/esmc-<modelsize>-2024-12-sae-sweep<layer><number>-k64-codebook<size>
-```
-The ESMC sparse autoencoders (SAEs) are unsupervised neural networks trained to decompose the learned internal representations from the [ESMC model variants](https://huggingface.co/collections/biohub/esmc-model-family) into a sparse set of more easily interpretable features, revealing what the model “sees” of the user’s protein input. Each feature represents a specific biologically relevant property of the protein, such as a zinc binding site, beta barrel structure, or transmembrane helix.
-The ESM Atlas is a map of 6.8 billion proteins covering the full breadth of life’s biodiversity and more than one billion predicted structures, built upon the [ESMC 6B](https://huggingface.co/biohub/esmc-6b-2024-12) SAEs to translate the model’s internal representations into \~16,000 interpretable biological features. Learn more about how to use the ESM Atlas [here](https://biohub.ai/esmc/atlas).
 ## Intended Use
-* Reconstructing ESMC embeddings into interpretable features
-* Visualizing feature activations on sequences and structures
 ## Usage
-The SAE model `esmc-6b-2024-12-sae-sweep-layer60-k64-codebook16384` provides users with interpretable, agent-generated feature descriptions that are detailed [here](https://huggingface.co/datasets/biohub/ESMC-SAE-Features). Users can access these feature descriptions through the [ESM Atlas](https://biohub.ai/esmc/atlas) or through the [Biohub Platform](https://biohub.ai/).
-While all SAE models can be accessed through Hugging Face, only the 5 SAE models that have normalization statistics are available through the [Biohub Platform](https://biohub.ai/models/esmc). These SAE models are:
-* [esmc-6b-2024-12-sae-sweep-layer60-k64-codebook16384](https://huggingface.co/biohub/esmc-6b-2024-12-sae-sweep-layer60-k64-codebook16384)
-* [Esmc-6b-2024-12-sae-sweep-layer60-k64-codebook65536](https://huggingface.co/biohub/esmc-6b-2024-12-sae-sweep-layer60-k64-codebook65536)
-* [esmc-600m-2024-12-sae-sweep-layer27-k64-codebook16384](https://huggingface.co/biohub/esmc-600m-2024-12-sae-sweep-layer27-k64-codebook16384)
-* [esmc-600m-2024-12-sae-sweep-layer27-k64-codebook65536](https://huggingface.co/biohub/esmc-600m-2024-12-sae-sweep-layer27-k64-codebook65536)
-* [esmc-300m-2024-12-sae-sweep-layer23-k64-codebook65536](https://huggingface.co/biohub/esmc-300M-2024-12-sae-sweep-layer23-k64-codebook65536)
 You can access an SAE model through Hugging Face using the code below:
 ```py
 import torch
-from transformers import AutoModel, AutoModelForMaskedLM, AutoTokenizer
-# GFP sequence
-sequences = ["MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK"]
-model = AutoModelForMaskedLM.from_pretrained(
-"Biohub/esmc-6b-2024-12",
-device_map="auto", # place model on GPU(s) if available
-).eval()
-tokenizer = AutoTokenizer.from_pretrained("Biohub/esmc-6b-2024-12")
-# Load SAE(s)
-sae_models = []
-sae = AutoModel.from_pretrained("Biohub/esmc-6b-2024-12-sae-sweep-layer60-k64-codebook65536", device_map="auto")
-sae_models.append(sae)
-# Add SAE(s) to the ESMC model
-model.add_sae_models(sae_models)
-inputs = tokenizer(sequences, return_tensors="pt", padding=True)
-inputs = {k: v.to(model.device) for k, v in inputs.items()} # place inputs on device
 with torch.inference_mode():
-output = model(**inputs)
-print(output.sae_outputs) # Access SAE outputs
-```
-For additional details about using SAE models, see the tutorials [here](https://colab.research.google.com/github/Biohub/esm/blob/main/cookbook/tutorials/8_protein_interpretation_sae.ipynb).
 ## Model Details
-ESMC SAEs are trained to reconstruct ESMC embeddings at a *residue level*, meaning for a protein of length L, there are L sparse vectors of SAE features. The embeddings are extracted from the hidden states after the multi-layered perceptron (MLP) layer. SAE models are trained on MLP and on state activations to provide different options for protein interpretability. Training SAE models on MLP layers can make individual features from each layer more interpretable while training the SAE models on state activations may provide a more global understanding. We use the [TopK](https://arxiv.org/abs/2412.06410) approach for training SAEs to control sparsity by only allowing the top k features at each position to be active.
 There are two critical hyperparameters for the TopK approach:
-* `k`: the number of active features per position
-* `codebook_size`: the total size of the codebook.
-The SAE codebook size (some fields use the term “dictionary”) determines how many distinct features the model can learn to represent. For protein models, the codebook size determines the balance between how accurately the model can reconstruct complex biological data and how easily interpretable those features are. As codebook size increases, the SAE can both reconstruct embeddings more faithfully and learn a higher number of specific features. However, the larger the codebook size the greater the computational expense and probability of detecting more features that are difficult to interpret or rarely activate.
-* Small codebook: The SAE is forced to group related concepts together. For example, a single feature might activate for all "metal-binding sites."
-* Large codebook: The model can "split" that general concept into more specific, granular features. Instead of one broad feature, you might get separate dedicated features for example "zinc-finger motifs," "iron-sulfur clusters," and "calcium-binding loops".
-### Model naming
-Models are named as follows: `{esmc-model}-sae-{sweep|state|residual}-layer{layer}-k{k}-codebook{codebook_size}`
-For example, to get the sweep model at ESMC 6B using the target sparsity value (k) of 64 with the 65k codebook use the following code:  `esmc-6b-2024-12-sae-sweep-layer60-k64-codebook65536`. The example code below shows how to use this model naming convention to obtain a specific model variant. See the Usage section below to see an example.
-The codebook values are as follows: 8192, 16384, 32768, 65536, 131072.
-### SAE models
-The table below lists the variations for the SAE models. Each SAE variant is trained on a base model, either MLP or state activations, a specific codebook size, and a specific layer of the base model. Thus, the first row of the table indicates that there are 81 (one model for every 1-90 transformer layers, plus an input embedding layer) SAE variants trained on ESMC 6B using state activations and a codebook of 16k features.
-| Model | MLP or State Activations | Codebooks | Layers |
-| :---- | :---- | :---- | :---- |
-| ESMC 6B | state | 16k | A model for every 1-80 transformer layers \+ input embedding layer |
-| ESMC 6B | state | 131k | A model for every 1-80 transformer layers \+ input embedding layer |
-| ESMC 6B | MLP | 131k | A model for every 1-80 transformer layers \+ 1 input embedding layer |
-| ESMC 600M | state | 16k | A model for every 1-37 transformer layers \+ 1 input embedding layer |
-| ESMC 600M | MLP | 131k | A model for every 1-37 transformer layers \+ 1 input embedding layer |
-| ESMC 300M | state | 16k | 1-31 transformer layers \+ 1 input embedding layer |
-| ESMC 300M | MLP | 131k | A model for every 1-31 transformer layers \+ 1 input embedding layer |
-### Sweep variants
-In addition to the SAEs listed above, there are sweeps trained across the different ESMC model variants.The table below summarizes the ESMC variant and the layer used for training. We targeted a 75% depth after testing showed that middle-to-late layers yielded the most pertinent feature information, similar to findings from other large language learning models. For each layer, there are 30 pre-trained models covering every combination of five codebook sizes (8k ,16k, 32k, 65k, 131k) and six target sparsities (16, 32, 64 ,128, 256, 512).
-| Base model variant | Training layer |
-| :---- | :---- |
-| ESMC 6B | 60 |
-| ESMC 600M | 27 |
-| ESMC 300M | 23 |
-### ESM Atlas
-The SAE model `esmc-6b-2024-12-sae-sweep-layer60-k64-codebook16384` is the model that provides users with interpretable, agent-generated feature descriptions that are detailed [here](https://huggingface.co/datasets/biohub/ESMC-SAE-Features). Users can access these feature descriptions through the [ESM Atlas](https://biohub.ai/esmc/atlas) or through the Biohub Platform Biohub’s open platform with the code [here](https://github.com/evolutionaryscale/esm/blob/cookbook/snippets/sae_example.py).
-### Normalization Statistics
-The following four SAE models include max/IDF normalization statistics:
-* `esmc-600m-2024-12-sae-sweep-layer27-k64-codebook16384`
-* `esmc-600m-2024-12-sae-sweep-layer27-k64-codebook65536`
-* `esmc-6b-2024-12-sae-sweep-layer60-k64-codebook16384`
-* `esmc-6b-2024-12-sae-sweep-layer60-k64-codebook65536`
-Normalization statistics are computed by running each model over UniRef90 and recording two quantities per feature: (1) the maximum activation value observed across the entire dataset, and (2) the Inverse Document Frequency (IDF), defined as log(N / f), where N is the total number of proteins and f is the number of proteins in which the feature was active (non-zero).
-At inference  activations are normalized as: (activation / max) \* idf. Dividing by the maximum scales each feature's output to the range \[0, 1\], making features more comparable to each other. Multiplying by IDF then upweights rare, distinctive features and downweights ubiquitous ones, making it easier for users to distinguish biologically relevant features.

 ---
+license:
+- mit
 - other
 license_link: https://github.com/Biohub/esm/blob/main/THIRD_PARTY_NOTICE.md
+library_name: transformers
+language: en
+tags:
+- biology
+- esm
+- protein
+- sparse-autoencoder
+- interpretability
+- protein-embeddings
+- feature-extraction
+- protein-language-model
+- unsupervised-learning
 - transformers
 ---
+# ESMC Sparse Autoencoders
+This model card provides an overview of the intended use of the ESMC SAE models and examples of how to access them, but it does not have a specific model or model weights. To access each SAE model collection, use the links below:
+- [ESMC SAEs for hidden states (all layers)](https://huggingface.co/collections/biohub/esmc-saes-for-hidden-states-all-layers)
+- [ESMC SAEs for MLP outputs (all layers)](https://huggingface.co/collections/biohub/esmc-saes-for-mlp-outputs-all-layers)
+- [ESMC SAEs for one layer (different sparsity / codebook size)](https://huggingface.co/collections/biohub/esmc-saes-for-one-layer-different-sparsity-codebook-size)
+The ESMC sparse autoencoders (SAEs) are unsupervised neural networks trained to decompose the learned internal representations from the ESMC model variants into a sparse representation space comprising more biologically interpretable features, revealing what the model "sees" of the user's protein input. Each feature is encouraged to be approximately monosemantic (capturing one interpretable concept) through a large feature space combined with a sparsity constraint, and may represent a specific biologically relevant property of the protein, such as a zinc binding site, beta barrel structure, or transmembrane helix.
+Building on top of the ESMC 6B SAEs, the ESM Atlas is a map of 6.8 billion proteins covering the full breadth of life's biodiversity and more than one billion predicted structures. The SAEs enable translation of the model's internal representations into ~16,000 interpretable biological features. Learn more about how to use the ESM Atlas on the Biohub Platform.
+Read more about ESMC and SAEs in our [paper](https://biohub.ai/papers/esm_protein.pdf).
 ## Intended Use
+- Decomposing ESMC embeddings into interpretable features
+- Visualizing feature activations on sequences and structures
 ## Usage
+The SAE model `ESMC-6B-sae-layer60-k64-codebook16384` provides users with interpretable, agent-generated feature descriptions. Users can access these feature descriptions through the ESM Atlas or through the Biohub Platform.
+While all SAE models can be accessed through Hugging Face, only the following five SAE models are available through the Biohub Platform:
+- `ESMC-6B-sae-layer60-k64-codebook16384`
+- `ESMC-6B-sae-layer60-k64-codebook65536`
+- `ESMC-600M-sae-layer27-k64-codebook16384`
+- `ESMC-600M-sae-layer27-k64-codebook65536`
+- `ESMC-300M-sae-layer23-k64-codebook65536`
 You can access an SAE model through Hugging Face using the code below:
 ```py
 import torch
+from transformers import AutoModel, AutoTokenizer
+sequence = "MGSNKSKPKDASQRRRSLEPAENVHGAGGGAFPASQTPSKPASADGHRGPSAAFAPAAAEPKLFGGFNSSDTVTSPQRAGPLAGGVTTFVALYDYESRTETDLSFKKGERLQIVNNTEGDWWLAHSLSTGQTGYIPSNYVAPSDSIQAEEWYFGKITRRESERLLLNAENPRGTFLVRESETTKGAYCLSVSDFDNAKGLNVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSKHADGLCHRLTTVCPTSKPQTQGLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTWNGTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMSKGSLLDFLKGETGKYLRLPQLVDMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVADFGLARLIEDNEYTARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELTTKGRVPYPGMVNREVLDQVERGYRMPCPPECPESLHDLMCQCWRKEPEERPTFEYLQAFLEDYFTSTEPQYQPGENL"
+model = AutoModel.from_pretrained("biohub/ESMC-6B", device_map="auto").eval()
+tokenizer = AutoTokenizer.from_pretrained("biohub/ESMC-6B")
+sae = AutoModel.from_pretrained(
+    "biohub/ESMC-6B-sae-k64-codebook16384",
+    allow_patterns=["config.json", "layer_30.safetensors", "layer_60.safetensors"],
+    device=model.device,
+)
+sae.initialize_layers([30, 60])
+model.add_sae_models([sae.layers["30"], sae.layers["60"]])
+inputs = tokenizer(sequence, return_tensors="pt", padding=True)
+inputs = {k: v.to(model.device) for k, v in inputs.items()}
 with torch.inference_mode():
+    output = model(**inputs)
+# sparse.coo tensor of shape (batch, seq_len, codebook_size)
+print(output["sae_outputs"]["layer60"].shape)
+```
 ## Model Details
+ESMC SAEs are trained to reconstruct ESMC embeddings at a residue level, meaning for a protein of length L, there are L sparse vectors of SAE features. For the models hosted on the Biohub platform, the embeddings are extracted from the hidden states. To provide different options for protein interpretability, our full set of SAE models contains both models that have been trained on hidden states and models that have been trained directly on the MLP outputs. Training SAE models on MLP outputs generates features specific to that layer's computation, while training the SAE models on hidden states may provide a more global understanding. We use the TopK approach for training SAEs to control sparsity by only allowing the top `k` features at each position to be active.
 There are two critical hyperparameters for the TopK approach:
+- `k`: the number of active features per position
+- `codebook_size`: total number of features the SAE can learn
+With smaller codebooks, the SAE may group related concepts together. For example, a single feature might activate for all metal-binding sites. With larger codebooks, the model can split general concepts into more granular features. For example, the model may learn dedicated features for zinc-finger motifs, iron-sulfur clusters, and calcium-binding loops.
+### Model Naming
+There are three different families of models based on the SAE training target.
+**Hidden states — SAE model for every layer.** The first family is trained on hidden states at every layer of the respective ESMC models. The naming convention is:
+```
+{esmc-model}-sae-k64-codebook16384
+```
+The options for `esmc-model` are:
+- `ESMC-300M`
+- `ESMC-600M`
+- `ESMC-6B`
+**MLP outputs — SAE model for every layer.** The second family is trained on the per-layer MLP output (before the residual connection) at every layer of the respective ESMC models. The naming convention is:
+```
+{esmc-model}-sae-mlp-k64-codebook131072
+```
+The options for `esmc-model` are:
+- `ESMC-300M`
+- `ESMC-600M`
+- `ESMC-6B`
+**Layer-specific — SAE model for every combination of `k` and `codebook_size`.** The last family is trained on one specific layer of the respective ESMC models with different top-`k` and codebook sizes. With these models, we targeted a 75% depth after various analyses showed that representations at this depth are often the most generalizable to a variety of downstream tasks, similar to findings from other large language models. The naming convention is:
+```
+{esmc-model}-sae-layer{layer_num}-k{k}-codebook{codebook}
+```
+The options for `esmc-model` with the corresponding `layer_num`:
+| `esmc-model` | `layer_num` |
+| :---- | ----: |
+| `ESMC-300M` | 23 |
+| `ESMC-600M` | 27 |
+| `ESMC-6B` | 60 |
+The options for `k` are: 16, 32, 64, 128, 256, 512.
+The options for `codebook` are: 8192, 16384, 32768, 65536, 131072.
+For example, to load the SAE trained on hidden states at layer 60 in ESMC 6B with `k=64` and the 65k codebook, use `ESMC-6B-sae-layer60-k64-codebook65536`.
+## Feature Descriptions
+The SAE model `ESMC-6B-sae-layer60-k64-codebook16384` is the model most heavily studied in our paper and was used to generate features for the ESM Atlas. We also created agent-generated feature descriptions for this model. Users can access these feature descriptions through the ESM Atlas or through the Biohub Platform API.
+## Normalization Statistics
+Only the `ESMC-6B-sae-layer60-k64-codebook16384` model has accessible normalization statistics. The other Biohub-platform-hosted models also include the option to normalize the SAE features, but these statistics are not currently accessible.
+Normalization statistics are computed by using each model to compute SAE features for all proteins in UniRef90 and recording two quantities per feature: (1) the maximum activation value observed across the entire dataset, and (2) the Inverse Document Frequency (IDF), defined as `log(N / f)`, where `N` is the total number of proteins and `f` is the number of proteins in which the feature was active (non-zero).
+At inference, activations are normalized as `(activation / max) * idf`. Dividing by the maximum scales each feature's output to the range `[0, 1]`, making features more comparable to each other. Multiplying by IDF then upweights rare, distinctive features and downweights ubiquitous ones, making it easier to distinguish biologically relevant features. These statistics can be accessed through the feature-description API.
+## Frontier Safety
+Biohub has established a safety team to assess the benefits and potential risks of our models and tools prior to release, and develop mitigations where necessary. Informed by our risk assessments, we are releasing the source code and model weights for our ESMC SAEs. We are also releasing our ESM Atlas dataset openly.
+Biohub.ai Platform: We implement guardrails that detect and restrict the use of keywords and sequences corresponding to controlled pathogens and toxins on our freely accessible platform. For further details regarding these guardrails, please refer to our Biohub platform Resources page.