VoCo (3D medical-image SwinUNETR-encoder foundation backbone) -- VoCo-H SwinUNETR encoder (feature_size 192)

Description

VoCo (Volume Contrast), ported to JAX / Equinox from the upstream PyTorch release. VoCo self-supervises a MONAI SwinUNETR on ~160K 3D medical volumes using geometric context priors, producing large generalist encoders (31M-1.2B params; B/L/H = feature_size 48/96/192). The released SSL_head checkpoints are the pretrained SwinUNETR encoder path -- the SwinViT transformer encoder (use_v2) plus the UNETR residual conv blocks -- with the decoder / segmentation head left for downstream fine-tuning. This port exposes that encoder as a transfer backbone whose multi-scale skip features are the representation.

Intended use

As the L variant, but the larger feature_size 192 encoder (channels 192, 192, 384, 768, 3072). Same Volume-Contrast pretraining; the highest-capacity public VoCo backbone in this import. Encoder-only.

Usage

from ilex.models.voco import VoCoSwinUNETR
model = VoCoSwinUNETR.from_pretrained('ilex-hub/voco.h.1')

Authors

Wu L., et al.

Citation

Wu L., Zhuang J., Chen H., et al. (2024). VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis. CVPR 2024. arXiv:2402.17300. Wu L., et al. (2025). Large-Scale 3D Medical Image Pre-training with Geometric Context Priors. TPAMI 2025. arXiv:2410.09890. Backbone: MONAI SwinUNETR (Hatamizadeh A., et al. 2022; arXiv:2201.01266).

References

Wu L., Zhuang J., Chen H., et al. (2024). VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis. CVPR 2024. arXiv:2402.17300.
Wu L., et al. (2025). Large-Scale 3D Medical Image Pre-training with Geometric Context Priors. TPAMI 2025. arXiv:2410.09890.
Weights: https://huggingface.co/Luffy503/VoCo ; code: https://github.com/Luffy03/Large-Scale-Medical

License

HF Hub license tag: apache-2.0

Effective terms: Apache-2.0 (the VoCo authors) on both the code (https://github.com/Luffy03/Large-Scale-Medical) and the released pretrained checkpoints (https://huggingface.co/Luffy503/VoCo). The MONAI SwinUNETR backbone is itself Apache-2.0. No commercial restrictions; no gating required. The ilex JAX / Equinox port code is separately licensed under Apache-2.0 / GPL-3.0.

Upstream license reference: https://github.com/Luffy03/Large-Scale-Medical/blob/main/LICENSE

Copyright

Network architecture (MONAI SwinUNETR) and pretrained weights: copyright (c) the VoCo authors, released under the Apache-2.0 License. JAX / Equinox port: copyright (c) the ilex authors, released under the Apache-2.0 / GPL-3.0 dual license used by ilex itself.

Upstream source

Original weights / reference implementation: https://huggingface.co/Luffy503/VoCo

Provenance

This artefact was produced by ilex's save/load pipeline. The architecture is implemented in ilex.models.voco.VoCoSwinUNETR and the weights have been converted from their upstream format. See the upstream source above for the canonical reference.

Downloads last month: 4

Safetensors

Model size

0.8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for ilex-hub/voco.h.1

Large-Scale 3D Medical Image Pre-training with Geometric Context Priors

Paper • 2410.09890 • Published Oct 13, 2024 • 1

VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis

Paper • 2402.17300 • Published Feb 27, 2024

Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images

Paper • 2201.01266 • Published Jan 4, 2022 • 3