---
language:
- en
license: apache-2.0
datasets:
- ptb-xl
tags:
- ecg
- cardiology
- signal-processing
- medical
- unet
- clip
- lead-generation
- pytorch
- 1d-unet
- film-conditioning
metrics:
- rmse
pipeline_tag: other
library_name: pytorch
---

# 🫀 ECG Lead Generator — 7 → 12 Leads

A **CLIP-conditioned 1D U-Net** that reconstructs 5 missing precordial ECG leads (V2–V6)
from 7 available leads (I, II, III, aVR, aVL, aVF, V1), enabling full 12-lead ECG synthesis
from reduced-lead recordings.

---

## Clinical Motivation

Standard 12-lead ECGs require 10 body-surface electrodes. In wearables, ambulatory monitoring,
and emergency pre-hospital settings, only limb leads + V1 may be feasible to acquire.
This model reconstructs the missing precordial leads with clinical fidelity using a
visual-language prior from CLIP.

---

## Architecture

```
Input [B, 7, 2500]  ──►  1D U-Net (CLIP-FiLM conditioned)  ──►  Output [B, 5, 2500]
                               ▲
                    CLIP-ViT-L/14 (frozen)
                    ECG red-grid image → 1024-d embedding
                    FiLM-injected at every encoder/decoder scale
```

| Component | Detail |
|---|---|
| Backbone | 1D U-Net, 4 encoder + 4 decoder scales |
| Conditioning | CLIP-ViT-L/14 → 1024-d pooler output |
| Conditioning mechanism | FiLM (Feature-wise Linear Modulation) at every scale |
| Parameters | **13,703,507** (LeadGenerator only; CLIP is frozen) |
| Base channels | 64 |
| Sequence length | 2500 samples (5 s @ 500 Hz) |
| Loss | Huber loss |
| Optimiser | AdamW + CosineAnnealingLR |

### Why CLIP conditioning?
Each ECG is rendered as a **red-grid clinical image** (standard paper layout) before being
passed through CLIP-ViT-L. The resulting 1024-d embedding captures morphological patterns
visually and is injected into the U-Net via FiLM — allowing the generator to produce
lead-consistent waveforms conditioned on the global ECG appearance.

---

## Performance

Evaluated on a held-out 10% split of PTB-XL (500 Hz, 200 records).

| Lead | RMSE ↓ | DTW (normalised) ↓ |
|------|--------|--------------------|
| V2   | 0.41751 | 0.10016           |
| V3   | 0.52274 | 0.10500           |
| V4   | 0.45217 | 0.09813           |
| V5   | 0.35278 | 0.07953           |
| V6   | 0.37252 | 0.09069           |
| **Mean** | **0.42355** | **0.09470** |

> Evaluated on 200 held-out PTB-XL records (500 Hz). V5 achieves the best reconstruction
> quality (RMSE 0.353), consistent with its anatomical proximity to V4 and V6 which are
> both present in the training conditioning signal.

---

## How to Use

### Install dependencies

```bash
pip install torch huggingface_hub safetensors transformers
```

### Load the model

```python
import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
import json

# Load config
cfg_path = hf_hub_download("rishsoraganvi/ecg-lead-generator", "config.json")
with open(cfg_path) as f:
    cfg = json.load(f)

# Paste or import LeadGenerator from model.py in this repo
from model import LeadGenerator

model = LeadGenerator(
    ni=cfg["n_in"],      # 7
    no=cfg["n_out"],     # 5
    ch=cfg["base_ch"],   # 64
    cd=cfg["clip_dim"],  # 1024
)

weights_path = hf_hub_download("rishsoraganvi/ecg-lead-generator", "model.safetensors")
model.load_state_dict(load_file(weights_path))
model.eval()
```

### Run inference

```python
import torch
import numpy as np
from transformers import CLIPProcessor, CLIPModel

# 1. Load CLIP (frozen)
clip_model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
clip_enc   = clip_model.vision_model.eval()
clip_proc  = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")

# 2. Prepare 7-lead ECG input: numpy array [7, 2500], 500 Hz, z-normalised
ecg_7lead = np.random.randn(7, 2500).astype(np.float32)  # replace with real data

# 3. Render ECG as red-grid image → CLIP embedding
#    (use render_redgrid() from the notebook or your preprocessing pipeline)
from PIL import Image
img  = render_redgrid(ecg_7lead)   # PIL RGB image
inp  = clip_proc(images=[img], return_tensors="pt")
with torch.no_grad():
    clip_emb = clip_enc(**inp).pooler_output   # [1, 1024]

# 4. Generate missing leads
x = torch.FloatTensor(ecg_7lead).unsqueeze(0)  # [1, 7, 2500]
with torch.no_grad():
    pred = model(x, clip_emb)                  # [1, 5, 2500]

# pred contains V2, V3, V4, V5, V6
```

---

## Training Details

| Setting | Value |
|---|---|
| Dataset | PTB-XL (PhysioNet, v1.0.3) |
| Sampling rate | 500 Hz |
| Training samples | 2,000 records |
| Train / Val / Test | 80 / 10 / 10 |
| Preprocessing | Butterworth bandpass 0.5–40 Hz + z-score normalisation |
| Epochs | 60 |
| Batch size | 32 |
| Learning rate | 1e-3 |
| Weight decay | 1e-4 |
| Gradient clipping | 1.0 |
| Hardware | Vast.ai A100 GPU |
| CLIP model | `openai/clip-vit-large-patch14` (frozen) |

---

## Repository Structure

```
ecg-lead-generator/
├── model.safetensors        # Model weights (safetensors format)
├── config.json              # Model configuration
├── model.py                 # LeadGenerator architecture
└── README.md                # This file
```

---

## Limitations

- Trained on 2,000 PTB-XL records — a larger training set is recommended for production use
- Validated on 500 Hz recordings only
- Not validated on all pathological ECG subtypes present in clinical practice
- **Not a medical device** — intended for research and educational purposes only

---

## Citation

If you use this model in your work, please cite PTB-XL:

```bibtex
@article{wagner2020ptb,
  title={PTB-XL, a large publicly available electrocardiography dataset},
  author={Wagner, Patrick and Strodthoff, Nils and Bousseljot, Ralf-Dieter and others},
  journal={Scientific Data},
  volume={7},
  number={1},
  pages={154},
  year={2020},
  publisher={Nature Publishing Group}
}
```

---

## Author

**Rishabh Soraganvi** — [GitHub](https://github.com/rishsoraganvi) · [Hugging Face](https://huggingface.co/rishsoraganvi)