25 GB
6,055 files
Updated 5 days ago
Name
Size
cache
logs
manifests
.gitattributes2.5 kB
xet
README.md2.86 kB
xet
README.md

BrainAge Golden Preprocessed Cache

6,050 preprocessed brain MRI tensors ready for training a brain-age prediction model. Skip the 40+ hour preprocessing step and jump straight to model training.

What's inside

Each .pt file (one per subject) contains:

Key Type Shape Description
volume float16 (128, 144, 112) Z-normed T1w brain in MNI space, trilinear-resized
tab float32 (86,) 70 regional volumes (log1p/12) + 3 sex one-hot + 13 site one-hot
age float32 scalar Chronological age in years
meta dict subject_id, site, sex, age, split

Stats

Metric Value
Total subjects 6,050
Age range 0 – 86 years
Source datasets 12 (BCP, Calgary, ds002726, ds000248, PTBP, IXI, MPI-Leipzig, AOMIC, NKI-Rockland, ABIDE-I, ABIDE-II, ADHD-200)
Volume shape 128 × 144 × 112 (D × H × W)
Tabular dim 86 (70 regions + 3 sex + 13 site)
File size ~4 MB each
Total size ~24 GB

Preprocessing pipeline applied

Raw T1w NIfTI
  → HD-BET skull-strip (GPU)
  → N4 bias correction (ANTs)
  → Affine registration to MNI152 1mm
  → Z-score intensity normalization
  → Harvard-Oxford atlas segmentation (69 regions)
  → Volume measurement + rescaling to native space
  → Tensor packaging (.pt)

Quick start

from huggingface_hub import snapshot_download
import torch

# Download (~24 GB)
snapshot_download(
    "bilalahmad176176/BrainAge-Golden-Preprocessed",
    repo_type="dataset",
    local_dir="cache/"
)

# Load one subject
data = torch.load("cache/cache/IXI002.pt", weights_only=False)
print(data["volume"].shape)  # (128, 144, 112) float16
print(data["tab"].shape)     # (86,) float32
print(data["age"])            # e.g. 36.2
print(data["meta"])           # {'subject_id': 'IXI002', 'site': 'DataSet-6_IXI', ...}

Train a model

# Generate split
python -m pipeline_v2.data_split \
    --manifests Golden-0-to-25/manifest.csv Golden-25plus/manifest.csv \
    --out cache/split.csv

# Train
python -m pipeline_v2.train \
    --cache_dir cache/cache \
    --split_csv cache/split.csv \
    --out_ckpt brainage_sfcn.pt \
    --epochs 60 --batch 4

Related

Citation

Please cite the original source studies listed in the raw dataset manifests.

Total size
25 GB
Files
6,055
Last updated
Jun 22
Pre-warmed CDN
US EU US EU

Contributors