biohub
/

ESMC-6B-sae-k64-codebook16384

+---
+license:
+- mit
+- other
+license_link: https://github.com/Biohub/esm/blob/main/THIRD_PARTY_NOTICE.md
+library_name: transformers
+language: en
+tags:
+- biology
+- esm
+- protein
+- sparse-autoencoder
+- interpretability
+- protein-embeddings
+- feature-extraction
+- protein-language-model
+- unsupervised-learning
+- transformers
+---
+# ESMC Sparse Autoencoders
+This model card provides an overview of the intended use of the ESMC SAE models and examples of how to access them, but it does not have a specific model or model weights. To access each SAE model collection, use the links below:
+- [ESMC SAEs for hidden states (all layers)](https://huggingface.co/collections/biohub/esmc-saes-for-hidden-states-all-layers)
+- [ESMC SAEs for MLP outputs (all layers)](https://huggingface.co/collections/biohub/esmc-saes-for-mlp-outputs-all-layers)
+- [ESMC SAEs for one layer (different sparsity / codebook size)](https://huggingface.co/collections/biohub/esmc-saes-for-one-layer-different-sparsity-codebook-size)
+The ESMC sparse autoencoders (SAEs) are unsupervised neural networks trained to decompose the learned internal representations from the ESMC model variants into a sparse representation space comprising more biologically interpretable features, revealing what the model "sees" of the user's protein input. Each feature is encouraged to be approximately monosemantic (capturing one interpretable concept) through a large feature space combined with a sparsity constraint, and may represent a specific biologically relevant property of the protein, such as a zinc binding site, beta barrel structure, or transmembrane helix.
+Building on top of the ESMC 6B SAEs, the ESM Atlas is a map of 6.8 billion proteins covering the full breadth of life's biodiversity and more than one billion predicted structures. The SAEs enable translation of the model's internal representations into ~16,000 interpretable biological features. Learn more about how to use the ESM Atlas on the Biohub Platform.
+Read more about ESMC and SAEs in our [paper](https://biohub.ai/papers/esm_protein.pdf).
+## Intended Use
+- Decomposing ESMC embeddings into interpretable features
+- Visualizing feature activations on sequences and structures
+## Usage
+The SAE model `ESMC-6B-sae-layer60-k64-codebook16384` provides users with interpretable, agent-generated feature descriptions. Users can access these feature descriptions through the ESM Atlas or through the Biohub Platform.
+While all SAE models can be accessed through Hugging Face, only the following five SAE models are available through the Biohub Platform:
+- `ESMC-6B-sae-layer60-k64-codebook16384`
+- `ESMC-6B-sae-layer60-k64-codebook65536`
+- `ESMC-600M-sae-layer27-k64-codebook16384`
+- `ESMC-600M-sae-layer27-k64-codebook65536`
+- `ESMC-300M-sae-layer23-k64-codebook65536`
+You can access an SAE model through Hugging Face using the code below:
+```py
+import torch
+from transformers import AutoModel, AutoTokenizer
+sequence = "MGSNKSKPKDASQRRRSLEPAENVHGAGGGAFPASQTPSKPASADGHRGPSAAFAPAAAEPKLFGGFNSSDTVTSPQRAGPLAGGVTTFVALYDYESRTETDLSFKKGERLQIVNNTEGDWWLAHSLSTGQTGYIPSNYVAPSDSIQAEEWYFGKITRRESERLLLNAENPRGTFLVRESETTKGAYCLSVSDFDNAKGLNVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSKHADGLCHRLTTVCPTSKPQTQGLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTWNGTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMSKGSLLDFLKGETGKYLRLPQLVDMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVADFGLARLIEDNEYTARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELTTKGRVPYPGMVNREVLDQVERGYRMPCPPECPESLHDLMCQCWRKEPEERPTFEYLQAFLEDYFTSTEPQYQPGENL"
+model = AutoModel.from_pretrained("biohub/ESMC-6B", device_map="auto").eval()
+tokenizer = AutoTokenizer.from_pretrained("biohub/ESMC-6B")
+sae = AutoModel.from_pretrained(
+    "biohub/ESMC-6B-sae-k64-codebook16384",
+    allow_patterns=["config.json", "layer_30.safetensors", "layer_60.safetensors"],
+    device=model.device,
+)
+sae.initialize_layers([30, 60])
+model.add_sae_models([sae.layers["30"], sae.layers["60"]])
+inputs = tokenizer(sequence, return_tensors="pt", padding=True)
+inputs = {k: v.to(model.device) for k, v in inputs.items()}
+with torch.inference_mode():
+    output = model(**inputs)
+# sparse.coo tensor of shape (batch, seq_len, codebook_size)
+print(output["sae_outputs"]["layer60"].shape)
+```
+## Model Details
+ESMC SAEs are trained to reconstruct ESMC embeddings at a residue level, meaning for a protein of length L, there are L sparse vectors of SAE features. For the models hosted on the Biohub platform, the embeddings are extracted from the hidden states. To provide different options for protein interpretability, our full set of SAE models contains both models that have been trained on hidden states and models that have been trained directly on the MLP outputs. Training SAE models on MLP outputs generates features specific to that layer's computation, while training the SAE models on hidden states may provide a more global understanding. We use the TopK approach for training SAEs to control sparsity by only allowing the top `k` features at each position to be active.
+There are two critical hyperparameters for the TopK approach:
+- `k`: the number of active features per position
+- `codebook_size`: total number of features the SAE can learn
+With smaller codebooks, the SAE may group related concepts together. For example, a single feature might activate for all metal-binding sites. With larger codebooks, the model can split general concepts into more granular features. For example, the model may learn dedicated features for zinc-finger motifs, iron-sulfur clusters, and calcium-binding loops.
+### Model Naming
+There are three different families of models based on the SAE training target.
+**Hidden states — SAE model for every layer.** The first family is trained on hidden states at every layer of the respective ESMC models. The naming convention is:
+```
+{esmc-model}-sae-k64-codebook16384
+```
+The options for `esmc-model` are:
+- `ESMC-300M`
+- `ESMC-600M`
+- `ESMC-6B`
+**MLP outputs — SAE model for every layer.** The second family is trained on the per-layer MLP output (before the residual connection) at every layer of the respective ESMC models. The naming convention is:
+```
+{esmc-model}-sae-mlp-k64-codebook131072
+```
+The options for `esmc-model` are:
+- `ESMC-300M`
+- `ESMC-600M`
+- `ESMC-6B`
+**Layer-specific — SAE model for every combination of `k` and `codebook_size`.** The last family is trained on one specific layer of the respective ESMC models with different top-`k` and codebook sizes. With these models, we targeted a 75% depth after various analyses showed that representations at this depth are often the most generalizable to a variety of downstream tasks, similar to findings from other large language models. The naming convention is:
+```
+{esmc-model}-sae-layer{layer_num}-k{k}-codebook{codebook}
+```
+The options for `esmc-model` with the corresponding `layer_num`:
+| `esmc-model` | `layer_num` |
+| :---- | ----: |
+| `ESMC-300M` | 23 |
+| `ESMC-600M` | 27 |
+| `ESMC-6B` | 60 |
+The options for `k` are: 16, 32, 64, 128, 256, 512.
+The options for `codebook` are: 8192, 16384, 32768, 65536, 131072.
+For example, to load the SAE trained on hidden states at layer 60 in ESMC 6B with `k=64` and the 65k codebook, use `ESMC-6B-sae-layer60-k64-codebook65536`.
+## Feature Descriptions
+The SAE model `ESMC-6B-sae-layer60-k64-codebook16384` is the model most heavily studied in our paper and was used to generate features for the ESM Atlas. We also created agent-generated feature descriptions for this model. Users can access these feature descriptions through the ESM Atlas or through the Biohub Platform API.
+## Normalization Statistics
+Only the `ESMC-6B-sae-layer60-k64-codebook16384` model has accessible normalization statistics. The other Biohub-platform-hosted models also include the option to normalize the SAE features, but these statistics are not currently accessible.
+Normalization statistics are computed by using each model to compute SAE features for all proteins in UniRef90 and recording two quantities per feature: (1) the maximum activation value observed across the entire dataset, and (2) the Inverse Document Frequency (IDF), defined as `log(N / f)`, where `N` is the total number of proteins and `f` is the number of proteins in which the feature was active (non-zero).
+At inference, activations are normalized as `(activation / max) * idf`. Dividing by the maximum scales each feature's output to the range `[0, 1]`, making features more comparable to each other. Multiplying by IDF then upweights rare, distinctive features and downweights ubiquitous ones, making it easier to distinguish biologically relevant features. These statistics can be accessed through the feature-description API.
+## Frontier Safety
+Biohub has established a safety team to assess the benefits and potential risks of our models and tools prior to release, and develop mitigations where necessary. Informed by our risk assessments, we are releasing the source code and model weights for our ESMC SAEs. We are also releasing our ESM Atlas dataset openly.
+Biohub.ai Platform: We implement guardrails that detect and restrict the use of keywords and sequences corresponding to controlled pathogens and toxins on our freely accessible platform. For further details regarding these guardrails, please refer to our Biohub platform Resources page.