---
title: SpaceFormer Open-Vocab 3D Instance Segmentation
emoji: 🧩
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
tags:
  - 3d
  - point-cloud
  - instance-segmentation
  - open-vocabulary
---

# SpaceFormer — Open-Vocabulary 3D Instance Segmentation (demo)

Proposal-free **open-vocabulary 3D instance segmentation**. A Mask2Former-style query
decoder (learned queries + RoPE) on top of the WarpConvNet `SpaCeFormer` backbone: one
forward pass over an RGB point cloud produces query masks + per-query CLIP features,
which are labeled against text embeddings of **arbitrary** class names (SigLIP2, with
prompt ensembling) — the vocabulary is chosen at inference time.

Released checkpoint:

| Benchmark | mAP |
|---|---|
| ScanNet200 | 0.1265 |
| ScanNet++ | 0.2217 |
| Replica | 0.2644 |

This repo is the **demo / inference layer**. The model itself lives in WarpConvNet
(`warpconvnet.models.spaceformer`); this repo only adds the Gradio UI (`app.py`) and a
CLI inference entry point (`inference.py`).

## Requirements

```bash
pip install -r requirements.txt
```

> **WarpConvNet must be installed with its compiled extension** (a pre-built wheel, or
> build from source). It is intentionally not pinned in `requirements.txt` because it is
> environment-specific. `transformers` pulls the SigLIP2 text encoder
> (`google/siglip2-so400m-patch14-224`) on first use.

## Live demo (Gradio / HuggingFace Space)

```bash
HF_REPO_ID=chrischoy/SpaCeFormer python app.py
# or a local checkpoint:
SPACEFORMER_CKPT=/path/to/spaceformer_512_siglip2_ssccc.ckpt python app.py
```

Upload a point cloud, type comma-separated class names, get an interactive 3D view
colored by predicted instance + a ranked table. As a **HuggingFace Space**: create a
**GPU** Gradio Space, install WarpConvNet + `requirements.txt` in the image, and set the
Space variables `HF_REPO_ID` (and optional `HF_FILENAME`, default
`spaceformer_512_siglip2_ssccc.ckpt`).

## Local demo (viser)

An interactive, self-contained local demo that takes **text class names**, runs
segmentation, and visualizes the result in the browser with
[viser](https://viser.studio) — each predicted instance gets a distinct color,
unassigned points stay grey, and a GUI panel lists the top instances.

```bash
# auto-download the checkpoint + use a bundled sample point cloud
python demo_viser.py --port 8080

# your own cloud + vocabulary, local checkpoint
python demo_viser.py --ckpt /path/to/spaceformer_512_siglip2_ssccc.ckpt \
    --ply my_scene.ply --class-names chair table monitor wall floor

# full ScanNet200 label set
python demo_viser.py --ply my_scene.ply --use-scannet200
```

Then open the printed URL (default `http://localhost:8080`) in a browser.
With no `--ply`, the demo uses an open3d bundled sample cloud (or a synthesized
random RGB cloud) — a generic cloud won't segment meaningfully; it only
demonstrates that the pipeline + viewer run end to end. The demo colors the
model's **output** points (`out["backbone_pc"].coordinates`), which are what the
predicted masks index into after the model's internal voxelization — not the raw
`.ply` points, whose count may differ.

## CLI inference

```bash
# local checkpoint
python inference.py --ckpt /path/to/spaceformer_512_siglip2_ssccc.ckpt \
    --scene /path/to/scene_dir                 # dir with coord.npy + color.npy

# or auto-download from a HuggingFace model repo
HF_REPO_ID=chrischoy/SpaCeFormer python inference.py \
    --scene my_scene.ply --class-names "office chair" "desk" "monitor" "other"

# full ScanNet200 label set
python inference.py --ckpt <ckpt> --scene <scene> --use-scannet200
```

`--scene` accepts a directory with `coord.npy`(`[N,3]` float meters)+`color.npy`(`[N,3]`
0–255), a `.npz` `{coord,color}`, an `[N,6]` `.npy` (xyz,rgb), or a `.ply`. Coordinates
stay in **meters** — the model voxelizes internally at 2 cm. Output: a ranked list of
`{label, score, #points}`; `score = objectness · mask_quality · class_prob`.

## License

Apache-2.0, matching the WarpConvNet `space_former.py` SPDX header.