--- license: mit library_name: bpnet tags: - bpnet - dna - genomics - transcription-factor-binding - chip-seq - encode - encode-bpnet-atlas - hg38 - qc-passed - EGR1 --- # ENCODE BPNet Atlas As part of the ENCODE 4 Project, we trained BPNet models on 2,339 ENCODE transcription factor ChIP-seq experiments spanning 788 targets across 175 biosamples. Here, we provide all models for open-source use. For more information about the models, see: - Main ENCODE 4 Paper - A unified lexicon of predictive DNA sequence motifs from ENCODE transcription factor binding and chromatin accessibility assays (Deshpande et al., Zenodo 2025) - Base-resolution models of transcription-factor binding reveal soft motif syntax (Avsec et al., Nat Genet 2021) ## BPNet model: EGR1 ChIP-seq in Ishikawa (ENCSR000BSQ) - Model: BPNet - Assay: TF ChIP-seq - Target: EGR1 - Experiment: [ENCSR000BSQ](https://www.encodeproject.org/experiments/ENCSR000BSQ/) - Model annotation: [ENCSR133QFY](https://www.encodeproject.org/annotations/ENCSR133QFY/) - Biosample: Ishikawa (Full name: Homo sapiens Ishikawa) - Cell slim(s): cancer cell - Organ slim(s): uterus - Developmental slim(s): None - System slim(s): reproductive system - Assembly: hg38 ## QC - Status: passed - Notes: Found direct motif (counts, profile); ## Directory structure 5-fold cross-validation. Each `fold_*/` contains the trained BPNet model in two formats: - `fold_0/model.h5` — BPNet model in .h5 (Keras) format - `fold_0/saved_model/` — BPNet model in TensorFlow SavedModel format (a directory; load directly) - `config.json` — training / architecture parameters ## Instructions BPNet takes a one-hot DNA sequence plus control (bias) inputs and predicts stranded profile logits and total logcounts. The control inputs come from the matched WCE/Input DNA control and **can be passed as zeros**. ### 1. Loading the SavedModel and making predictions ```python import numpy as np import tensorflow as tf from scipy.special import logsumexp model = tf.saved_model.load("fold_0/saved_model") # sequence: (N, 2114, 4) one-hot [A,C,G,T] # profile_bias_input: (N, 1000, 2) per-base profile bias from WCE/Input control, or zeros # counts_bias_input: (N, 2) log2 total counts from WCE/Input control, or zeros predictions = model.signatures["serving_default"](**{ "sequence": sequence.astype("float32"), "profile_bias_input_0": profile_bias_input.astype("float32"), "counts_bias_input_0": counts_bias_input.astype("float32")}) # predictions["profile_predictions"]: (N, 1000, 2) logits (strands NOT independent) # predictions["logcounts_predictions"]: (N, 1) total logcount output_len = 1000 def vectorized_prediction_to_profile(predictions): logits_arr = predictions["profile_predictions"] counts_arr = predictions["logcounts_predictions"] pred_profile_logits = np.reshape(logits_arr, [-1, 1, output_len * 2]) probVals_array = np.exp(pred_profile_logits - logsumexp( pred_profile_logits, axis=2).reshape([len(logits_arr), 1, 1])) profile_predictions = np.multiply( np.exp(counts_arr).reshape([len(counts_arr), 1, 1]), probVals_array) plus = np.reshape(profile_predictions, [len(counts_arr), output_len, 2])[:, :, 0] minus = np.reshape(profile_predictions, [len(counts_arr), output_len, 2])[:, :, 1] return plus, minus, counts_arr plus, minus, logcounts = vectorized_prediction_to_profile(predictions) ``` ### 2. Loading the .h5 (Keras) and making predictions ```python import numpy as np import tensorflow as tf import tensorflow.keras.backend as kb from tensorflow.keras.models import load_model from tensorflow.keras.utils import CustomObjectScope from bpnet.model.custommodel import CustomModel def get_model(model_path): with CustomObjectScope({"kb": kb, "tf": tf, "CustomModel": CustomModel}): return load_model(model_path) model = get_model("fold_0/model.h5") N = sequence.shape[0] predictions = model.predict([ sequence, # (N, 2114, 4) np.zeros((N, 1000, 2)), # profile_bias_input (or real WCE/Input control values) np.zeros((N, 2))]) # counts_bias_input (or real control log2 counts) # predictions[0]: (N, 1000, 2) logits; predictions[1]: (N, 1) logcounts # convert with the same vectorized_prediction_to_profile() (predictions[0], predictions[1]) ``` ## Docker image to load and use the models `kundajelab/bpnet-atlas` (placeholder — image forthcoming). ## Code - Code: https://github.com/kundajelab/bpnet/ - Toolbox & downstream analysis: https://github.com/kundajelab/bpnet/wiki ## License & citation Released under the MIT license. The models are derived from ENCODE data (unrestricted use under the ENCODE data-use policy). Please cite the ENCODE Project Consortium and the model software: BPNet (Avsec et al., Nat Genet 2021).