Add BPNet model ENCSR128AHG (ENCSR795CKZ)
Browse files- .gitattributes +5 -0
- README.md +137 -0
- config.json +45 -0
- fold_0/model.h5 +3 -0
- fold_0/saved_model/saved_model.pb +3 -0
- fold_0/saved_model/variables/variables.data-00000-of-00001 +3 -0
- fold_0/saved_model/variables/variables.index +0 -0
- fold_1/model.h5 +3 -0
- fold_1/saved_model/saved_model.pb +3 -0
- fold_1/saved_model/variables/variables.data-00000-of-00001 +3 -0
- fold_1/saved_model/variables/variables.index +0 -0
- fold_2/model.h5 +3 -0
- fold_2/saved_model/saved_model.pb +3 -0
- fold_2/saved_model/variables/variables.data-00000-of-00001 +3 -0
- fold_2/saved_model/variables/variables.index +0 -0
- fold_3/model.h5 +3 -0
- fold_3/saved_model/saved_model.pb +3 -0
- fold_3/saved_model/variables/variables.data-00000-of-00001 +3 -0
- fold_3/saved_model/variables/variables.index +0 -0
- fold_4/model.h5 +3 -0
- fold_4/saved_model/saved_model.pb +3 -0
- fold_4/saved_model/variables/variables.data-00000-of-00001 +3 -0
- fold_4/saved_model/variables/variables.index +0 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
fold_0/saved_model/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
fold_1/saved_model/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
fold_2/saved_model/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
fold_3/saved_model/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
|
| 40 |
+
fold_4/saved_model/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,137 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-4.0
|
| 3 |
+
library_name: bpnet
|
| 4 |
+
tags:
|
| 5 |
+
- bpnet
|
| 6 |
+
- dna
|
| 7 |
+
- genomics
|
| 8 |
+
- transcription-factor-binding
|
| 9 |
+
- chip-seq
|
| 10 |
+
- encode
|
| 11 |
+
- encode-bpnet-atlas
|
| 12 |
+
- hg38
|
| 13 |
+
- qc-unvalidated
|
| 14 |
+
- ZSCAN30
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
# ENCODE BPNet Atlas
|
| 18 |
+
|
| 19 |
+
As part of the ENCODE 4 Project, we trained BPNet models on 2,339 ENCODE
|
| 20 |
+
transcription factor ChIP-seq experiments spanning 788 targets across
|
| 21 |
+
175 biosamples. Here, we provide all models for open-source use.
|
| 22 |
+
|
| 23 |
+
For more information about the models, see:
|
| 24 |
+
|
| 25 |
+
- Main ENCODE 4 Paper
|
| 26 |
+
- A unified lexicon of predictive DNA sequence motifs from ENCODE transcription
|
| 27 |
+
factor binding and chromatin accessibility assays (Deshpande et al., Zenodo 2025)
|
| 28 |
+
- Base-resolution models of transcription-factor binding reveal soft motif syntax
|
| 29 |
+
(Avsec et al., Nat Genet 2021)
|
| 30 |
+
|
| 31 |
+
## BPNet model: ZSCAN30 ChIP-seq in HepG2 (ENCSR795CKZ)
|
| 32 |
+
|
| 33 |
+
- Model: BPNet
|
| 34 |
+
- Assay: TF ChIP-seq
|
| 35 |
+
- Target: ZSCAN30
|
| 36 |
+
- Experiment: [ENCSR795CKZ](https://www.encodeproject.org/experiments/ENCSR795CKZ/)
|
| 37 |
+
- Model annotation: [ENCSR128AHG](https://www.encodeproject.org/annotations/ENCSR128AHG/)
|
| 38 |
+
- Biosample: HepG2 (Full name: Homo sapiens HepG2 genetically modified (insertion) using CRISPR targeting H. sapiens ZSCAN30)
|
| 39 |
+
- Cell slim(s): epithelial cell, cancer cell
|
| 40 |
+
- Organ slim(s): endocrine gland, liver, exocrine gland, epithelium
|
| 41 |
+
- Developmental slim(s): endoderm
|
| 42 |
+
- System slim(s): endocrine system, digestive system, exocrine system
|
| 43 |
+
- Assembly: hg38
|
| 44 |
+
|
| 45 |
+
## QC
|
| 46 |
+
|
| 47 |
+
- Status: unvalidated
|
| 48 |
+
- Notes: Found potential direct motif (counts);
|
| 49 |
+
|
| 50 |
+
## Directory structure
|
| 51 |
+
|
| 52 |
+
5-fold cross-validation. Each `fold_*/` contains the trained BPNet model in two formats:
|
| 53 |
+
|
| 54 |
+
- `fold_0/model.h5` — BPNet model in .h5 (Keras) format
|
| 55 |
+
- `fold_0/saved_model/` — BPNet model in TensorFlow SavedModel format (a directory; load directly)
|
| 56 |
+
- `config.json` — training / architecture parameters
|
| 57 |
+
|
| 58 |
+
## Instructions
|
| 59 |
+
|
| 60 |
+
BPNet takes a one-hot DNA sequence plus control (bias) inputs and predicts
|
| 61 |
+
stranded profile logits and total logcounts. The control inputs come from the
|
| 62 |
+
matched WCE/Input DNA control and **can be passed as zeros**.
|
| 63 |
+
|
| 64 |
+
### 1. Loading the SavedModel and making predictions
|
| 65 |
+
|
| 66 |
+
```python
|
| 67 |
+
import numpy as np
|
| 68 |
+
import tensorflow as tf
|
| 69 |
+
from scipy.special import logsumexp
|
| 70 |
+
|
| 71 |
+
model = tf.saved_model.load("fold_0/saved_model")
|
| 72 |
+
# sequence: (N, 2114, 4) one-hot [A,C,G,T]
|
| 73 |
+
# profile_bias_input: (N, 1000, 2) per-base profile bias from WCE/Input control, or zeros
|
| 74 |
+
# counts_bias_input: (N, 2) log2 total counts from WCE/Input control, or zeros
|
| 75 |
+
predictions = model.signatures["serving_default"](**{
|
| 76 |
+
"sequence": sequence.astype("float32"),
|
| 77 |
+
"profile_bias_input_0": profile_bias_input.astype("float32"),
|
| 78 |
+
"counts_bias_input_0": counts_bias_input.astype("float32")})
|
| 79 |
+
# predictions["profile_predictions"]: (N, 1000, 2) logits (strands NOT independent)
|
| 80 |
+
# predictions["logcounts_predictions"]: (N, 1) total logcount
|
| 81 |
+
|
| 82 |
+
output_len = 1000
|
| 83 |
+
def vectorized_prediction_to_profile(predictions):
|
| 84 |
+
logits_arr = predictions["profile_predictions"]
|
| 85 |
+
counts_arr = predictions["logcounts_predictions"]
|
| 86 |
+
pred_profile_logits = np.reshape(logits_arr, [-1, 1, output_len * 2])
|
| 87 |
+
probVals_array = np.exp(pred_profile_logits - logsumexp(
|
| 88 |
+
pred_profile_logits, axis=2).reshape([len(logits_arr), 1, 1]))
|
| 89 |
+
profile_predictions = np.multiply(
|
| 90 |
+
np.exp(counts_arr).reshape([len(counts_arr), 1, 1]), probVals_array)
|
| 91 |
+
plus = np.reshape(profile_predictions, [len(counts_arr), output_len, 2])[:, :, 0]
|
| 92 |
+
minus = np.reshape(profile_predictions, [len(counts_arr), output_len, 2])[:, :, 1]
|
| 93 |
+
return plus, minus, counts_arr
|
| 94 |
+
|
| 95 |
+
plus, minus, logcounts = vectorized_prediction_to_profile(predictions)
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
### 2. Loading the .h5 (Keras) and making predictions
|
| 99 |
+
|
| 100 |
+
```python
|
| 101 |
+
import numpy as np
|
| 102 |
+
import tensorflow as tf
|
| 103 |
+
import tensorflow.keras.backend as kb
|
| 104 |
+
from tensorflow.keras.models import load_model
|
| 105 |
+
from tensorflow.keras.utils import CustomObjectScope
|
| 106 |
+
from bpnet.model.custommodel import CustomModel
|
| 107 |
+
|
| 108 |
+
def get_model(model_path):
|
| 109 |
+
with CustomObjectScope({"kb": kb, "tf": tf, "CustomModel": CustomModel}):
|
| 110 |
+
return load_model(model_path)
|
| 111 |
+
|
| 112 |
+
model = get_model("fold_0/model.h5")
|
| 113 |
+
N = sequence.shape[0]
|
| 114 |
+
predictions = model.predict([
|
| 115 |
+
sequence, # (N, 2114, 4)
|
| 116 |
+
np.zeros((N, 1000, 2)), # profile_bias_input (or real WCE/Input control values)
|
| 117 |
+
np.zeros((N, 2))]) # counts_bias_input (or real control log2 counts)
|
| 118 |
+
# predictions[0]: (N, 1000, 2) logits; predictions[1]: (N, 1) logcounts
|
| 119 |
+
# convert with the same vectorized_prediction_to_profile() (predictions[0], predictions[1])
|
| 120 |
+
```
|
| 121 |
+
|
| 122 |
+
## Docker image to load and use the models
|
| 123 |
+
|
| 124 |
+
`kundajelab/bpnet-atlas` (placeholder — image forthcoming).
|
| 125 |
+
|
| 126 |
+
## Code
|
| 127 |
+
|
| 128 |
+
- Code: https://github.com/kundajelab/bpnet/
|
| 129 |
+
- Toolbox & downstream analysis: https://github.com/kundajelab/bpnet/wiki
|
| 130 |
+
|
| 131 |
+
## License & citation
|
| 132 |
+
|
| 133 |
+
External data users may freely download, analyze and publish results based on any
|
| 134 |
+
ENCODE data without restrictions.
|
| 135 |
+
|
| 136 |
+
Released under the ENCODE data-use policy. Please cite the ENCODE Project
|
| 137 |
+
Consortium and the model software: BPNet (Avsec et al., Nat Genet 2021).
|
config.json
ADDED
|
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"input_len": 2114,
|
| 3 |
+
"output_profile_len": 1000,
|
| 4 |
+
"motif_module_params": {
|
| 5 |
+
"filters": [64],
|
| 6 |
+
"kernel_sizes": [21],
|
| 7 |
+
"padding": "valid"
|
| 8 |
+
},
|
| 9 |
+
"syntax_module_params": {
|
| 10 |
+
"num_dilation_layers": 8,
|
| 11 |
+
"filters": 64,
|
| 12 |
+
"kernel_size": 3,
|
| 13 |
+
"padding": "valid",
|
| 14 |
+
"pre_activation_residual_unit": true
|
| 15 |
+
},
|
| 16 |
+
"profile_head_params": {
|
| 17 |
+
"filters": 1,
|
| 18 |
+
"kernel_size": 75,
|
| 19 |
+
"padding": "valid"
|
| 20 |
+
},
|
| 21 |
+
"counts_head_params": {
|
| 22 |
+
"filters": 1,
|
| 23 |
+
"kernel_size": 75,
|
| 24 |
+
"padding": "valid",
|
| 25 |
+
"units": [1],
|
| 26 |
+
"activations":["linear"],
|
| 27 |
+
"dropouts":[0]
|
| 28 |
+
|
| 29 |
+
},
|
| 30 |
+
"profile_bias_module_params": {
|
| 31 |
+
"kernel_sizes": [1]
|
| 32 |
+
},
|
| 33 |
+
"counts_bias_module_params": {
|
| 34 |
+
},
|
| 35 |
+
"use_attribution_prior": false,
|
| 36 |
+
"attribution_prior_params": {
|
| 37 |
+
"frequency_limit": 150,
|
| 38 |
+
"limit_softness": 0.2,
|
| 39 |
+
"grad_smooth_sigma": 3,
|
| 40 |
+
"profile_grad_loss_weight": 200,
|
| 41 |
+
"counts_grad_loss_weight": 100
|
| 42 |
+
},
|
| 43 |
+
"loss_weights": [1, 89.23013136288998],
|
| 44 |
+
"counts_loss": "MSE"
|
| 45 |
+
}
|
fold_0/model.h5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3513b74c44b32181fab8ade2a8ba4a33fdb7a8e6d7c8fb0b67969e31b370c38d
|
| 3 |
+
size 561784
|
fold_0/saved_model/saved_model.pb
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4a82bf37571c9946c597eea36052307470566aeaa79d9e24648605812fea1876
|
| 3 |
+
size 750201
|
fold_0/saved_model/variables/variables.data-00000-of-00001
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:94c2ca1e39532719eb1d297cee87848a1b515e83f31a1f9ff790da0c36e9aec4
|
| 3 |
+
size 1393322
|
fold_0/saved_model/variables/variables.index
ADDED
|
Binary file (5.77 kB). View file
|
|
|
fold_1/model.h5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e2c070ee20eacb4fa69effba2c1cdfb1b28e4a45645996d919d3b9bf9c041573
|
| 3 |
+
size 561784
|
fold_1/saved_model/saved_model.pb
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:dc1dfa16250a5378ec2f847a231f76619c9669273ba0a441abffc09f7711f516
|
| 3 |
+
size 750201
|
fold_1/saved_model/variables/variables.data-00000-of-00001
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3a44ac3aa8e252b9ad451134edd4b59d522ec909b74e54dd7c6155782acd8bf7
|
| 3 |
+
size 1393322
|
fold_1/saved_model/variables/variables.index
ADDED
|
Binary file (5.77 kB). View file
|
|
|
fold_2/model.h5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:09d671d80e01d8fac60074eed8708eea28c12f83f77f328fd96b246af53a38b2
|
| 3 |
+
size 561784
|
fold_2/saved_model/saved_model.pb
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d455c0ab3d2939b7ec9d358a6f657c27a62c1b449a78813aeb941e3ec61c73f8
|
| 3 |
+
size 750201
|
fold_2/saved_model/variables/variables.data-00000-of-00001
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:131550022c0fbd523e3a8f7e05cfb6aed4ba7ca3adac755b4c3c1097cdb710d7
|
| 3 |
+
size 1393322
|
fold_2/saved_model/variables/variables.index
ADDED
|
Binary file (5.77 kB). View file
|
|
|
fold_3/model.h5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:73d503552a17b7e4977b56158aff03b7919ae0060fd0f00f32f91e95165a6e91
|
| 3 |
+
size 561784
|
fold_3/saved_model/saved_model.pb
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d6acf829a5e3e4aaa762e295e55804ea2a2e672907be0baa21d2aca1d0ec3d7e
|
| 3 |
+
size 750201
|
fold_3/saved_model/variables/variables.data-00000-of-00001
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3b2feea924fa491b7d976e9fea6b73d413c8c6b9656268378fd34fa7cdc1ff79
|
| 3 |
+
size 1393322
|
fold_3/saved_model/variables/variables.index
ADDED
|
Binary file (5.77 kB). View file
|
|
|
fold_4/model.h5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ca02ce9190fdd11001714dab1653fac5d89f326836bdeac4560d543f29505022
|
| 3 |
+
size 561784
|
fold_4/saved_model/saved_model.pb
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:29ceaf25ef0bd8cb282593dec1df9be4ba4fafea1c02b79d73f3c0f8858b0d72
|
| 3 |
+
size 750201
|
fold_4/saved_model/variables/variables.data-00000-of-00001
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:88c2ef6838726dac3c9b59067bc824d6bab7ad350d561d300208576f16d22534
|
| 3 |
+
size 1393322
|
fold_4/saved_model/variables/variables.index
ADDED
|
Binary file (5.77 kB). View file
|
|
|