Buckets:

zareenz741
/

BrainAge-Golden-Preprocessed-buckety

25 GB

6,055 files

Updated 6 days ago

Ctrl+K

Name	Size	Uploaded	Xet hash
manifests		6 days ago	2 items
logs		6 days ago	1 items
cache		6 days ago	6,050 items
README.md	2.86 kB xet	6 days ago	90ee7bb4
.gitattributes	2.5 kB xet	6 days ago	738f1125

README.md

BrainAge Golden Preprocessed Cache

6,050 preprocessed brain MRI tensors ready for training a brain-age prediction model. Skip the 40+ hour preprocessing step and jump straight to model training.

What's inside

Each .pt file (one per subject) contains:

Key	Type	Shape	Description
`volume`	float16	(128, 144, 112)	Z-normed T1w brain in MNI space, trilinear-resized
`tab`	float32	(86,)	70 regional volumes (log1p/12) + 3 sex one-hot + 13 site one-hot
`age`	float32	scalar	Chronological age in years
`meta`	dict	—	subject_id, site, sex, age, split

Stats

Metric	Value
Total subjects	6,050
Age range	0 – 86 years
Source datasets	12 (BCP, Calgary, ds002726, ds000248, PTBP, IXI, MPI-Leipzig, AOMIC, NKI-Rockland, ABIDE-I, ABIDE-II, ADHD-200)
Volume shape	128 × 144 × 112 (D × H × W)
Tabular dim	86 (70 regions + 3 sex + 13 site)
File size	~4 MB each
Total size	~24 GB

Preprocessing pipeline applied

Raw T1w NIfTI
  → HD-BET skull-strip (GPU)
  → N4 bias correction (ANTs)
  → Affine registration to MNI152 1mm
  → Z-score intensity normalization
  → Harvard-Oxford atlas segmentation (69 regions)
  → Volume measurement + rescaling to native space
  → Tensor packaging (.pt)

Quick start

from huggingface_hub import snapshot_download
import torch

# Download (~24 GB)
snapshot_download(
    "bilalahmad176176/BrainAge-Golden-Preprocessed",
    repo_type="dataset",
    local_dir="cache/"
)

# Load one subject
data = torch.load("cache/cache/IXI002.pt", weights_only=False)
print(data["volume"].shape)  # (128, 144, 112) float16
print(data["tab"].shape)     # (86,) float32
print(data["age"])            # e.g. 36.2
print(data["meta"])           # {'subject_id': 'IXI002', 'site': 'DataSet-6_IXI', ...}

Train a model

# Generate split
python -m pipeline_v2.data_split \
    --manifests Golden-0-to-25/manifest.csv Golden-25plus/manifest.csv \
    --out cache/split.csv

# Train
python -m pipeline_v2.train \
    --cache_dir cache/cache \
    --split_csv cache/split.csv \
    --out_ckpt brainage_sfcn.pt \
    --epochs 60 --batch 4

Raw dataset: bilalahmad176176/BrainAge-Golden-Raw
3D Viewer demo: bilalahmad176176/BrainAge-3D-Viewer

Citation

Please cite the original source studies listed in the raw dataset manifests.

Total size: 25 GB

Files: 6,055

Last updated: Jun 22

Pre-warmed CDN: US EU US EU