ENCODE BPNet Atlas

As part of the ENCODE 4 Project, we trained BPNet models on 2,339 ENCODE transcription factor ChIP-seq experiments spanning 788 targets across 175 biosamples. Here, we provide all models for open-source use.

For more information about the models, see:

  • Main ENCODE 4 Paper
  • A unified lexicon of predictive DNA sequence motifs from ENCODE transcription factor binding and chromatin accessibility assays (Deshpande et al., Zenodo 2025)
  • Base-resolution models of transcription-factor binding reveal soft motif syntax (Avsec et al., Nat Genet 2021)

BPNet model: EGR1 ChIP-seq in Ishikawa (ENCSR000BSQ)

  • Model: BPNet
  • Assay: TF ChIP-seq
  • Target: EGR1
  • Experiment: ENCSR000BSQ
  • Model annotation: ENCSR133QFY
  • Biosample: Ishikawa (Full name: Homo sapiens Ishikawa)
  • Cell slim(s): cancer cell
  • Organ slim(s): uterus
  • Developmental slim(s): None
  • System slim(s): reproductive system
  • Assembly: hg38

QC

  • Status: passed
  • Notes: Found direct motif (counts, profile);

Directory structure

5-fold cross-validation. Each fold_*/ contains the trained BPNet model in two formats:

  • fold_0/model.h5 โ€” BPNet model in .h5 (Keras) format
  • fold_0/saved_model/ โ€” BPNet model in TensorFlow SavedModel format (a directory; load directly)
  • config.json โ€” training / architecture parameters

Instructions

BPNet takes a one-hot DNA sequence plus control (bias) inputs and predicts stranded profile logits and total logcounts. The control inputs come from the matched WCE/Input DNA control and can be passed as zeros.

1. Loading the SavedModel and making predictions

import numpy as np
import tensorflow as tf
from scipy.special import logsumexp

model = tf.saved_model.load("fold_0/saved_model")
# sequence: (N, 2114, 4) one-hot [A,C,G,T]
# profile_bias_input: (N, 1000, 2) per-base profile bias from WCE/Input control, or zeros
# counts_bias_input:  (N, 2) log2 total counts from WCE/Input control, or zeros
predictions = model.signatures["serving_default"](**{
    "sequence": sequence.astype("float32"),
    "profile_bias_input_0": profile_bias_input.astype("float32"),
    "counts_bias_input_0": counts_bias_input.astype("float32")})
# predictions["profile_predictions"]:   (N, 1000, 2) logits (strands NOT independent)
# predictions["logcounts_predictions"]: (N, 1) total logcount

output_len = 1000
def vectorized_prediction_to_profile(predictions):
    logits_arr = predictions["profile_predictions"]
    counts_arr = predictions["logcounts_predictions"]
    pred_profile_logits = np.reshape(logits_arr, [-1, 1, output_len * 2])
    probVals_array = np.exp(pred_profile_logits - logsumexp(
        pred_profile_logits, axis=2).reshape([len(logits_arr), 1, 1]))
    profile_predictions = np.multiply(
        np.exp(counts_arr).reshape([len(counts_arr), 1, 1]), probVals_array)
    plus  = np.reshape(profile_predictions, [len(counts_arr), output_len, 2])[:, :, 0]
    minus = np.reshape(profile_predictions, [len(counts_arr), output_len, 2])[:, :, 1]
    return plus, minus, counts_arr

plus, minus, logcounts = vectorized_prediction_to_profile(predictions)

2. Loading the .h5 (Keras) and making predictions

import numpy as np
import tensorflow as tf
import tensorflow.keras.backend as kb
from tensorflow.keras.models import load_model
from tensorflow.keras.utils import CustomObjectScope
from bpnet.model.custommodel import CustomModel

def get_model(model_path):
    with CustomObjectScope({"kb": kb, "tf": tf, "CustomModel": CustomModel}):
        return load_model(model_path)

model = get_model("fold_0/model.h5")
N = sequence.shape[0]
predictions = model.predict([
    sequence,                     # (N, 2114, 4)
    np.zeros((N, 1000, 2)),       # profile_bias_input (or real WCE/Input control values)
    np.zeros((N, 2))])            # counts_bias_input  (or real control log2 counts)
# predictions[0]: (N, 1000, 2) logits;  predictions[1]: (N, 1) logcounts
# convert with the same vectorized_prediction_to_profile() (predictions[0], predictions[1])

Docker image to load and use the models

kundajelab/bpnet-atlas (placeholder โ€” image forthcoming).

Code

License & citation

Released under the MIT license.

The models are derived from ENCODE data (unrestricted use under the ENCODE data-use policy). Please cite the ENCODE Project Consortium and the model software: BPNet (Avsec et al., Nat Genet 2021).

Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support