SigLIP 2 B/16-256 — Core ML
Core ML conversion of google/siglip2-base-patch16-256, split into separate image and text encoders for on-device text→image retrieval. Built for Palmier Pro's footage search; usable by anything that wants SigLIP 2 on Apple silicon.
Files
| File | Contents |
|---|---|
ImageEncoder.mlpackage.zip |
Vision tower, 256×256 input, 8-bit palettized (per-grouped-channel) |
TextEncoder.mlpackage.zip |
Text tower, 64-token input, 8-bit palettized |
tokenizer.zip |
Gemma SentencePiece tokenizer files (tokenizer.json, config) |
manifest.json |
File names, sha256s, sizes, model dims |
Both encoders emit L2-normalized 768-d embeddings (embedding output); similarity
is a plain dot product. Minimum deployment target: macOS 15.
Usage notes
- Image preprocessing is a squash-resize to 256×256 (no center crop), pixels
scaled to [-1, 1]. The
ImageTypeinput already applies the scaling. - Text must be tokenized with the bundled Gemma tokenizer and padded to 64 with the pad token (0), no attention mask — SigLIP was trained that way and embeddings drift if padding differs.
- Conversion is parity-gated: every release's embeddings match the PyTorch
reference at cosine ≥ 0.99 on a fixture set. Conversion source:
palmier-io/palmier-pro
models/siglip2.
Versioning
Files in this repo are immutable once published. Re-conversions are published as new versions, never overwrites.
License
Apache 2.0, same as the original weights by Google. This repository redistributes a converted form of those weights without modification to their values beyond 8-bit palettization.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for palmier-io/siglip2-base-coreml
Base model
google/siglip2-base-patch16-256