vr-scientist commited on 17 days ago

Commit

cae6c71

verified ·

1 Parent(s): 904ee5a

Add BPNet model ENCSR128AHG (ENCSR795CKZ)

Browse files

Files changed (23) hide show

.gitattributes +5 -0
README.md +137 -0
config.json +45 -0
fold_0/model.h5 +3 -0
fold_0/saved_model/saved_model.pb +3 -0
fold_0/saved_model/variables/variables.data-00000-of-00001 +3 -0
fold_0/saved_model/variables/variables.index +0 -0
fold_1/model.h5 +3 -0
fold_1/saved_model/saved_model.pb +3 -0
fold_1/saved_model/variables/variables.data-00000-of-00001 +3 -0
fold_1/saved_model/variables/variables.index +0 -0
fold_2/model.h5 +3 -0
fold_2/saved_model/saved_model.pb +3 -0
fold_2/saved_model/variables/variables.data-00000-of-00001 +3 -0
fold_2/saved_model/variables/variables.index +0 -0
fold_3/model.h5 +3 -0
fold_3/saved_model/saved_model.pb +3 -0
fold_3/saved_model/variables/variables.data-00000-of-00001 +3 -0
fold_3/saved_model/variables/variables.index +0 -0
fold_4/model.h5 +3 -0
fold_4/saved_model/saved_model.pb +3 -0
fold_4/saved_model/variables/variables.data-00000-of-00001 +3 -0
fold_4/saved_model/variables/variables.index +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+fold_0/saved_model/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
+fold_1/saved_model/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
+fold_2/saved_model/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
+fold_3/saved_model/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
+fold_4/saved_model/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,137 @@

+---
+license: cc-by-4.0
+library_name: bpnet
+tags:
+  - bpnet
+  - dna
+  - genomics
+  - transcription-factor-binding
+  - chip-seq
+  - encode
+  - encode-bpnet-atlas
+  - hg38
+  - qc-unvalidated
+  - ZSCAN30
+---
+# ENCODE BPNet Atlas
+As part of the ENCODE 4 Project, we trained BPNet models on 2,339 ENCODE
+transcription factor ChIP-seq experiments spanning 788 targets across
+175 biosamples. Here, we provide all models for open-source use.
+For more information about the models, see:
+- Main ENCODE 4 Paper
+- A unified lexicon of predictive DNA sequence motifs from ENCODE transcription
+  factor binding and chromatin accessibility assays (Deshpande et al., Zenodo 2025)
+- Base-resolution models of transcription-factor binding reveal soft motif syntax
+  (Avsec et al., Nat Genet 2021)
+## BPNet model: ZSCAN30 ChIP-seq in HepG2 (ENCSR795CKZ)
+- Model: BPNet
+- Assay: TF ChIP-seq
+- Target: ZSCAN30
+- Experiment: [ENCSR795CKZ](https://www.encodeproject.org/experiments/ENCSR795CKZ/)
+- Model annotation: [ENCSR128AHG](https://www.encodeproject.org/annotations/ENCSR128AHG/)
+- Biosample: HepG2 (Full name: Homo sapiens HepG2 genetically modified (insertion) using CRISPR targeting H. sapiens ZSCAN30)
+- Cell slim(s): epithelial cell, cancer cell
+- Organ slim(s): endocrine gland, liver, exocrine gland, epithelium
+- Developmental slim(s): endoderm
+- System slim(s): endocrine system, digestive system, exocrine system
+- Assembly: hg38
+## QC
+- Status: unvalidated
+- Notes: Found potential direct motif (counts);
+## Directory structure
+5-fold cross-validation. Each `fold_*/` contains the trained BPNet model in two formats:
+- `fold_0/model.h5` — BPNet model in .h5 (Keras) format
+- `fold_0/saved_model/` — BPNet model in TensorFlow SavedModel format (a directory; load directly)
+- `config.json` — training / architecture parameters
+## Instructions
+BPNet takes a one-hot DNA sequence plus control (bias) inputs and predicts
+stranded profile logits and total logcounts. The control inputs come from the
+matched WCE/Input DNA control and **can be passed as zeros**.
+### 1. Loading the SavedModel and making predictions
+```python
+import numpy as np
+import tensorflow as tf
+from scipy.special import logsumexp
+model = tf.saved_model.load("fold_0/saved_model")
+# sequence: (N, 2114, 4) one-hot [A,C,G,T]
+# profile_bias_input: (N, 1000, 2) per-base profile bias from WCE/Input control, or zeros
+# counts_bias_input:  (N, 2) log2 total counts from WCE/Input control, or zeros
+predictions = model.signatures["serving_default"](**{
+    "sequence": sequence.astype("float32"),
+    "profile_bias_input_0": profile_bias_input.astype("float32"),
+    "counts_bias_input_0": counts_bias_input.astype("float32")})
+# predictions["profile_predictions"]:   (N, 1000, 2) logits (strands NOT independent)
+# predictions["logcounts_predictions"]: (N, 1) total logcount
+output_len = 1000
+def vectorized_prediction_to_profile(predictions):
+    logits_arr = predictions["profile_predictions"]
+    counts_arr = predictions["logcounts_predictions"]
+    pred_profile_logits = np.reshape(logits_arr, [-1, 1, output_len * 2])
+    probVals_array = np.exp(pred_profile_logits - logsumexp(
+        pred_profile_logits, axis=2).reshape([len(logits_arr), 1, 1]))
+    profile_predictions = np.multiply(
+        np.exp(counts_arr).reshape([len(counts_arr), 1, 1]), probVals_array)
+    plus  = np.reshape(profile_predictions, [len(counts_arr), output_len, 2])[:, :, 0]
+    minus = np.reshape(profile_predictions, [len(counts_arr), output_len, 2])[:, :, 1]
+    return plus, minus, counts_arr
+plus, minus, logcounts = vectorized_prediction_to_profile(predictions)
+```
+### 2. Loading the .h5 (Keras) and making predictions
+```python
+import numpy as np
+import tensorflow as tf
+import tensorflow.keras.backend as kb
+from tensorflow.keras.models import load_model
+from tensorflow.keras.utils import CustomObjectScope
+from bpnet.model.custommodel import CustomModel
+def get_model(model_path):
+    with CustomObjectScope({"kb": kb, "tf": tf, "CustomModel": CustomModel}):
+        return load_model(model_path)
+model = get_model("fold_0/model.h5")
+N = sequence.shape[0]
+predictions = model.predict([
+    sequence,                     # (N, 2114, 4)
+    np.zeros((N, 1000, 2)),       # profile_bias_input (or real WCE/Input control values)
+    np.zeros((N, 2))])            # counts_bias_input  (or real control log2 counts)
+# predictions[0]: (N, 1000, 2) logits;  predictions[1]: (N, 1) logcounts
+# convert with the same vectorized_prediction_to_profile() (predictions[0], predictions[1])
+```
+## Docker image to load and use the models
+`kundajelab/bpnet-atlas` (placeholder — image forthcoming).
+## Code
+- Code: https://github.com/kundajelab/bpnet/
+- Toolbox & downstream analysis: https://github.com/kundajelab/bpnet/wiki
+## License & citation
+External data users may freely download, analyze and publish results based on any
+ENCODE data without restrictions.
+Released under the ENCODE data-use policy. Please cite the ENCODE Project
+Consortium and the model software: BPNet (Avsec et al., Nat Genet 2021).

config.json ADDED Viewed

	@@ -0,0 +1,45 @@

+{
+    "input_len": 2114,
+    "output_profile_len": 1000,
+    "motif_module_params": {
+        "filters": [64],
+        "kernel_sizes": [21],
+        "padding": "valid"
+    },
+    "syntax_module_params": {
+        "num_dilation_layers": 8,
+        "filters": 64,
+        "kernel_size": 3,
+        "padding": "valid",
+        "pre_activation_residual_unit": true
+    },
+    "profile_head_params": {
+        "filters": 1,
+        "kernel_size":  75,
+        "padding": "valid"
+    },
+    "counts_head_params": {
+        "filters": 1,
+        "kernel_size":  75,
+        "padding": "valid",
+        "units": [1],
+        "activations":["linear"],
+		"dropouts":[0]
+    },
+    "profile_bias_module_params": {
+        "kernel_sizes": [1]
+    },
+    "counts_bias_module_params": {
+    },
+    "use_attribution_prior": false,
+    "attribution_prior_params": {
+        "frequency_limit": 150,
+        "limit_softness": 0.2,
+        "grad_smooth_sigma": 3,
+        "profile_grad_loss_weight": 200,
+        "counts_grad_loss_weight": 100
+    },
+    "loss_weights": [1, 89.23013136288998],
+    "counts_loss": "MSE"
+}

fold_0/model.h5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3513b74c44b32181fab8ade2a8ba4a33fdb7a8e6d7c8fb0b67969e31b370c38d
+size 561784

fold_0/saved_model/saved_model.pb ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4a82bf37571c9946c597eea36052307470566aeaa79d9e24648605812fea1876
+size 750201

fold_0/saved_model/variables/variables.data-00000-of-00001 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:94c2ca1e39532719eb1d297cee87848a1b515e83f31a1f9ff790da0c36e9aec4
+size 1393322

fold_0/saved_model/variables/variables.index ADDED Viewed

Binary file (5.77 kB). View file

fold_1/model.h5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e2c070ee20eacb4fa69effba2c1cdfb1b28e4a45645996d919d3b9bf9c041573
+size 561784

fold_1/saved_model/saved_model.pb ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dc1dfa16250a5378ec2f847a231f76619c9669273ba0a441abffc09f7711f516
+size 750201

fold_1/saved_model/variables/variables.data-00000-of-00001 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3a44ac3aa8e252b9ad451134edd4b59d522ec909b74e54dd7c6155782acd8bf7
+size 1393322

fold_1/saved_model/variables/variables.index ADDED Viewed

Binary file (5.77 kB). View file

fold_2/model.h5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:09d671d80e01d8fac60074eed8708eea28c12f83f77f328fd96b246af53a38b2
+size 561784

fold_2/saved_model/saved_model.pb ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d455c0ab3d2939b7ec9d358a6f657c27a62c1b449a78813aeb941e3ec61c73f8
+size 750201

fold_2/saved_model/variables/variables.data-00000-of-00001 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:131550022c0fbd523e3a8f7e05cfb6aed4ba7ca3adac755b4c3c1097cdb710d7
+size 1393322

fold_2/saved_model/variables/variables.index ADDED Viewed

Binary file (5.77 kB). View file

fold_3/model.h5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:73d503552a17b7e4977b56158aff03b7919ae0060fd0f00f32f91e95165a6e91
+size 561784

fold_3/saved_model/saved_model.pb ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d6acf829a5e3e4aaa762e295e55804ea2a2e672907be0baa21d2aca1d0ec3d7e
+size 750201

fold_3/saved_model/variables/variables.data-00000-of-00001 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3b2feea924fa491b7d976e9fea6b73d413c8c6b9656268378fd34fa7cdc1ff79
+size 1393322

fold_3/saved_model/variables/variables.index ADDED Viewed

Binary file (5.77 kB). View file

fold_4/model.h5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ca02ce9190fdd11001714dab1653fac5d89f326836bdeac4560d543f29505022
+size 561784

fold_4/saved_model/saved_model.pb ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:29ceaf25ef0bd8cb282593dec1df9be4ba4fafea1c02b79d73f3c0f8858b0d72
+size 750201

fold_4/saved_model/variables/variables.data-00000-of-00001 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:88c2ef6838726dac3c9b59067bc824d6bab7ad350d561d300208576f16d22534
+size 1393322

fold_4/saved_model/variables/variables.index ADDED Viewed

Binary file (5.77 kB). View file