vr-scientist commited on
Commit
cae6c71
·
verified ·
1 Parent(s): 904ee5a

Add BPNet model ENCSR128AHG (ENCSR795CKZ)

Browse files
.gitattributes CHANGED
@@ -33,3 +33,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ fold_0/saved_model/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
37
+ fold_1/saved_model/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
38
+ fold_2/saved_model/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
39
+ fold_3/saved_model/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
40
+ fold_4/saved_model/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ library_name: bpnet
4
+ tags:
5
+ - bpnet
6
+ - dna
7
+ - genomics
8
+ - transcription-factor-binding
9
+ - chip-seq
10
+ - encode
11
+ - encode-bpnet-atlas
12
+ - hg38
13
+ - qc-unvalidated
14
+ - ZSCAN30
15
+ ---
16
+
17
+ # ENCODE BPNet Atlas
18
+
19
+ As part of the ENCODE 4 Project, we trained BPNet models on 2,339 ENCODE
20
+ transcription factor ChIP-seq experiments spanning 788 targets across
21
+ 175 biosamples. Here, we provide all models for open-source use.
22
+
23
+ For more information about the models, see:
24
+
25
+ - Main ENCODE 4 Paper
26
+ - A unified lexicon of predictive DNA sequence motifs from ENCODE transcription
27
+ factor binding and chromatin accessibility assays (Deshpande et al., Zenodo 2025)
28
+ - Base-resolution models of transcription-factor binding reveal soft motif syntax
29
+ (Avsec et al., Nat Genet 2021)
30
+
31
+ ## BPNet model: ZSCAN30 ChIP-seq in HepG2 (ENCSR795CKZ)
32
+
33
+ - Model: BPNet
34
+ - Assay: TF ChIP-seq
35
+ - Target: ZSCAN30
36
+ - Experiment: [ENCSR795CKZ](https://www.encodeproject.org/experiments/ENCSR795CKZ/)
37
+ - Model annotation: [ENCSR128AHG](https://www.encodeproject.org/annotations/ENCSR128AHG/)
38
+ - Biosample: HepG2 (Full name: Homo sapiens HepG2 genetically modified (insertion) using CRISPR targeting H. sapiens ZSCAN30)
39
+ - Cell slim(s): epithelial cell, cancer cell
40
+ - Organ slim(s): endocrine gland, liver, exocrine gland, epithelium
41
+ - Developmental slim(s): endoderm
42
+ - System slim(s): endocrine system, digestive system, exocrine system
43
+ - Assembly: hg38
44
+
45
+ ## QC
46
+
47
+ - Status: unvalidated
48
+ - Notes: Found potential direct motif (counts);
49
+
50
+ ## Directory structure
51
+
52
+ 5-fold cross-validation. Each `fold_*/` contains the trained BPNet model in two formats:
53
+
54
+ - `fold_0/model.h5` — BPNet model in .h5 (Keras) format
55
+ - `fold_0/saved_model/` — BPNet model in TensorFlow SavedModel format (a directory; load directly)
56
+ - `config.json` — training / architecture parameters
57
+
58
+ ## Instructions
59
+
60
+ BPNet takes a one-hot DNA sequence plus control (bias) inputs and predicts
61
+ stranded profile logits and total logcounts. The control inputs come from the
62
+ matched WCE/Input DNA control and **can be passed as zeros**.
63
+
64
+ ### 1. Loading the SavedModel and making predictions
65
+
66
+ ```python
67
+ import numpy as np
68
+ import tensorflow as tf
69
+ from scipy.special import logsumexp
70
+
71
+ model = tf.saved_model.load("fold_0/saved_model")
72
+ # sequence: (N, 2114, 4) one-hot [A,C,G,T]
73
+ # profile_bias_input: (N, 1000, 2) per-base profile bias from WCE/Input control, or zeros
74
+ # counts_bias_input: (N, 2) log2 total counts from WCE/Input control, or zeros
75
+ predictions = model.signatures["serving_default"](**{
76
+ "sequence": sequence.astype("float32"),
77
+ "profile_bias_input_0": profile_bias_input.astype("float32"),
78
+ "counts_bias_input_0": counts_bias_input.astype("float32")})
79
+ # predictions["profile_predictions"]: (N, 1000, 2) logits (strands NOT independent)
80
+ # predictions["logcounts_predictions"]: (N, 1) total logcount
81
+
82
+ output_len = 1000
83
+ def vectorized_prediction_to_profile(predictions):
84
+ logits_arr = predictions["profile_predictions"]
85
+ counts_arr = predictions["logcounts_predictions"]
86
+ pred_profile_logits = np.reshape(logits_arr, [-1, 1, output_len * 2])
87
+ probVals_array = np.exp(pred_profile_logits - logsumexp(
88
+ pred_profile_logits, axis=2).reshape([len(logits_arr), 1, 1]))
89
+ profile_predictions = np.multiply(
90
+ np.exp(counts_arr).reshape([len(counts_arr), 1, 1]), probVals_array)
91
+ plus = np.reshape(profile_predictions, [len(counts_arr), output_len, 2])[:, :, 0]
92
+ minus = np.reshape(profile_predictions, [len(counts_arr), output_len, 2])[:, :, 1]
93
+ return plus, minus, counts_arr
94
+
95
+ plus, minus, logcounts = vectorized_prediction_to_profile(predictions)
96
+ ```
97
+
98
+ ### 2. Loading the .h5 (Keras) and making predictions
99
+
100
+ ```python
101
+ import numpy as np
102
+ import tensorflow as tf
103
+ import tensorflow.keras.backend as kb
104
+ from tensorflow.keras.models import load_model
105
+ from tensorflow.keras.utils import CustomObjectScope
106
+ from bpnet.model.custommodel import CustomModel
107
+
108
+ def get_model(model_path):
109
+ with CustomObjectScope({"kb": kb, "tf": tf, "CustomModel": CustomModel}):
110
+ return load_model(model_path)
111
+
112
+ model = get_model("fold_0/model.h5")
113
+ N = sequence.shape[0]
114
+ predictions = model.predict([
115
+ sequence, # (N, 2114, 4)
116
+ np.zeros((N, 1000, 2)), # profile_bias_input (or real WCE/Input control values)
117
+ np.zeros((N, 2))]) # counts_bias_input (or real control log2 counts)
118
+ # predictions[0]: (N, 1000, 2) logits; predictions[1]: (N, 1) logcounts
119
+ # convert with the same vectorized_prediction_to_profile() (predictions[0], predictions[1])
120
+ ```
121
+
122
+ ## Docker image to load and use the models
123
+
124
+ `kundajelab/bpnet-atlas` (placeholder — image forthcoming).
125
+
126
+ ## Code
127
+
128
+ - Code: https://github.com/kundajelab/bpnet/
129
+ - Toolbox & downstream analysis: https://github.com/kundajelab/bpnet/wiki
130
+
131
+ ## License & citation
132
+
133
+ External data users may freely download, analyze and publish results based on any
134
+ ENCODE data without restrictions.
135
+
136
+ Released under the ENCODE data-use policy. Please cite the ENCODE Project
137
+ Consortium and the model software: BPNet (Avsec et al., Nat Genet 2021).
config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "input_len": 2114,
3
+ "output_profile_len": 1000,
4
+ "motif_module_params": {
5
+ "filters": [64],
6
+ "kernel_sizes": [21],
7
+ "padding": "valid"
8
+ },
9
+ "syntax_module_params": {
10
+ "num_dilation_layers": 8,
11
+ "filters": 64,
12
+ "kernel_size": 3,
13
+ "padding": "valid",
14
+ "pre_activation_residual_unit": true
15
+ },
16
+ "profile_head_params": {
17
+ "filters": 1,
18
+ "kernel_size": 75,
19
+ "padding": "valid"
20
+ },
21
+ "counts_head_params": {
22
+ "filters": 1,
23
+ "kernel_size": 75,
24
+ "padding": "valid",
25
+ "units": [1],
26
+ "activations":["linear"],
27
+ "dropouts":[0]
28
+
29
+ },
30
+ "profile_bias_module_params": {
31
+ "kernel_sizes": [1]
32
+ },
33
+ "counts_bias_module_params": {
34
+ },
35
+ "use_attribution_prior": false,
36
+ "attribution_prior_params": {
37
+ "frequency_limit": 150,
38
+ "limit_softness": 0.2,
39
+ "grad_smooth_sigma": 3,
40
+ "profile_grad_loss_weight": 200,
41
+ "counts_grad_loss_weight": 100
42
+ },
43
+ "loss_weights": [1, 89.23013136288998],
44
+ "counts_loss": "MSE"
45
+ }
fold_0/model.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3513b74c44b32181fab8ade2a8ba4a33fdb7a8e6d7c8fb0b67969e31b370c38d
3
+ size 561784
fold_0/saved_model/saved_model.pb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4a82bf37571c9946c597eea36052307470566aeaa79d9e24648605812fea1876
3
+ size 750201
fold_0/saved_model/variables/variables.data-00000-of-00001 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:94c2ca1e39532719eb1d297cee87848a1b515e83f31a1f9ff790da0c36e9aec4
3
+ size 1393322
fold_0/saved_model/variables/variables.index ADDED
Binary file (5.77 kB). View file
 
fold_1/model.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e2c070ee20eacb4fa69effba2c1cdfb1b28e4a45645996d919d3b9bf9c041573
3
+ size 561784
fold_1/saved_model/saved_model.pb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc1dfa16250a5378ec2f847a231f76619c9669273ba0a441abffc09f7711f516
3
+ size 750201
fold_1/saved_model/variables/variables.data-00000-of-00001 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a44ac3aa8e252b9ad451134edd4b59d522ec909b74e54dd7c6155782acd8bf7
3
+ size 1393322
fold_1/saved_model/variables/variables.index ADDED
Binary file (5.77 kB). View file
 
fold_2/model.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:09d671d80e01d8fac60074eed8708eea28c12f83f77f328fd96b246af53a38b2
3
+ size 561784
fold_2/saved_model/saved_model.pb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d455c0ab3d2939b7ec9d358a6f657c27a62c1b449a78813aeb941e3ec61c73f8
3
+ size 750201
fold_2/saved_model/variables/variables.data-00000-of-00001 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:131550022c0fbd523e3a8f7e05cfb6aed4ba7ca3adac755b4c3c1097cdb710d7
3
+ size 1393322
fold_2/saved_model/variables/variables.index ADDED
Binary file (5.77 kB). View file
 
fold_3/model.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:73d503552a17b7e4977b56158aff03b7919ae0060fd0f00f32f91e95165a6e91
3
+ size 561784
fold_3/saved_model/saved_model.pb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d6acf829a5e3e4aaa762e295e55804ea2a2e672907be0baa21d2aca1d0ec3d7e
3
+ size 750201
fold_3/saved_model/variables/variables.data-00000-of-00001 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3b2feea924fa491b7d976e9fea6b73d413c8c6b9656268378fd34fa7cdc1ff79
3
+ size 1393322
fold_3/saved_model/variables/variables.index ADDED
Binary file (5.77 kB). View file
 
fold_4/model.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca02ce9190fdd11001714dab1653fac5d89f326836bdeac4560d543f29505022
3
+ size 561784
fold_4/saved_model/saved_model.pb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:29ceaf25ef0bd8cb282593dec1df9be4ba4fafea1c02b79d73f3c0f8858b0d72
3
+ size 750201
fold_4/saved_model/variables/variables.data-00000-of-00001 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:88c2ef6838726dac3c9b59067bc824d6bab7ad350d561d300208576f16d22534
3
+ size 1393322
fold_4/saved_model/variables/variables.index ADDED
Binary file (5.77 kB). View file