SegVol (text/point/box-promptable 3D segmentation) -- SegVol (MONAI ViT + CLIP text + SAM decoder)

Description

SegVol, ported to JAX / Equinox from the upstream PyTorch release. SegVol is a SAM-style promptable segmentation model for volumetric medical images that, alongside point and box (spatial) prompts, accepts text (semantic) prompts via a CLIP text encoder. It pairs a MONAI ViT image encoder (perceptron patch embedding) with a SAM prompt encoder + two-way- transformer mask decoder whose mask logits are fused with a text-aligned similarity map. Forward: one volume plus any combination of text token ids, a point set, and a box -> mask logits at the input resolution.

Intended use

Promptable 3D segmentation of a single-channel medical volume (intensity-normalised and resampled to 32x256x256 by the upstream processor). Prompts -- any combination of: CLIP token ids (L,) for a text prompt, point coordinates (N, 3) with labels (N,) where 1 = foreground / 0 = background, and a box (6,) [x0, y0, z0, x1, y1, z1]. Returns a single foreground mask-logit volume at the input resolution. The string-to-token tokenisation (CLIP BPE) and the zoom-in/zoom-out sliding-window refinement are out-of-model preprocessing / inference concerns.

Usage

from ilex.models.segvol import SegVol
model = SegVol.from_pretrained('ilex-hub/segvol.1')

Authors

Du Y., Bai F., Huang T., Zhao B.

Citation

Du Y., Bai F., Huang T., Zhao B. (2024). SegVol: Universal and Interactive Volumetric Medical Image Segmentation. NeurIPS 2024. arXiv:2311.13385. Built on Kirillov A., et al. (2023), Segment Anything, ICCV 2023, arXiv:2304.02643.

References

License

HF Hub license tag: mit

Effective terms: MIT (the SegVol authors, BAAI) on both the network code and the released BAAI/SegVol weights. The underlying Segment Anything design is Meta's (Apache-2.0); the text encoder is the HuggingFace transformers CLIPTextModel. No commercial restrictions; no gating required. The ilex JAX / Equinox port code is separately licensed under Apache-2.0 / GPL-3.0.

Upstream license reference: https://github.com/BAAI-DCAI/SegVol/blob/main/LICENSE

Copyright

Network architecture and pretrained weights: copyright (c) the SegVol authors (BAAI), released under the MIT License. The underlying Segment Anything design is Meta's (Apache-2.0); the CLIP text encoder is the HuggingFace transformers CLIPTextModel. JAX / Equinox port: copyright (c) the ilex authors, released under the Apache-2.0 / GPL-3.0 dual license used by ilex itself.

Upstream source

Original weights / reference implementation: https://github.com/BAAI-DCAI/SegVol

Provenance

This artefact was produced by ilex's save/load pipeline. The architecture is implemented in ilex.models.segvol.SegVol and the weights have been converted from their upstream format. See the upstream source above for the canonical reference.

Downloads last month
7
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for ilex-hub/segvol.1