SigLIP 2 B/16-256 — Core ML

Core ML conversion of google/siglip2-base-patch16-256, split into separate image and text encoders for on-device text→image retrieval. Built for Palmier Pro's footage search; usable by anything that wants SigLIP 2 on Apple silicon.

Files

File	Contents
`ImageEncoder.mlpackage.zip`	Vision tower, 256×256 input, 8-bit palettized (per-grouped-channel)
`TextEncoder.mlpackage.zip`	Text tower, 64-token input, 8-bit palettized
`tokenizer.zip`	Gemma SentencePiece tokenizer files (`tokenizer.json`, config)
`manifest.json`	File names, sha256s, sizes, model dims

Both encoders emit L2-normalized 768-d embeddings (embedding output); similarity is a plain dot product. Minimum deployment target: macOS 15.

Usage notes

Image preprocessing is a squash-resize to 256×256 (no center crop), pixels scaled to [-1, 1]. The ImageType input already applies the scaling.
Text must be tokenized with the bundled Gemma tokenizer and padded to 64 with the pad token (0), no attention mask — SigLIP was trained that way and embeddings drift if padding differs.
Conversion is parity-gated: every release's embeddings match the PyTorch reference at cosine ≥ 0.99 on a fixture set. Conversion source: palmier-io/palmier-pro models/siglip2.

Versioning

Files in this repo are immutable once published. Re-conversions are published as new versions, never overwrites.

License

Apache 2.0, same as the original weights by Google. This repository redistributes a converted form of those weights without modification to their values beyond 8-bit palettization.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for palmier-io/siglip2-base-coreml

Base model

google/siglip2-base-patch16-256

Finetuned

(3)

this model