--- license: apache-2.0 base_model: google/siglip2-base-patch16-256 tags: - coreml - siglip2 - image-text-retrieval library_name: coreml --- # SigLIP 2 B/16-256 — Core ML Core ML conversion of [google/siglip2-base-patch16-256](https://huggingface.co/google/siglip2-base-patch16-256), split into separate image and text encoders for on-device text→image retrieval. Built for [Palmier Pro](https://palmier.io)'s footage search; usable by anything that wants SigLIP 2 on Apple silicon. ## Files | File | Contents | |---|---| | `ImageEncoder.mlpackage.zip` | Vision tower, 256×256 input, 8-bit palettized (per-grouped-channel) | | `TextEncoder.mlpackage.zip` | Text tower, 64-token input, 8-bit palettized | | `tokenizer.zip` | Gemma SentencePiece tokenizer files (`tokenizer.json`, config) | | `manifest.json` | File names, sha256s, sizes, model dims | Both encoders emit L2-normalized 768-d embeddings (`embedding` output); similarity is a plain dot product. Minimum deployment target: macOS 15. ## Usage notes - Image preprocessing is a **squash-resize** to 256×256 (no center crop), pixels scaled to [-1, 1]. The `ImageType` input already applies the scaling. - Text must be tokenized with the bundled Gemma tokenizer and **padded to 64 with the pad token (0), no attention mask** — SigLIP was trained that way and embeddings drift if padding differs. - Conversion is parity-gated: every release's embeddings match the PyTorch reference at cosine ≥ 0.99 on a fixture set. Conversion source: [palmier-io/palmier-pro `models/siglip2`](https://github.com/palmier-io). ## Versioning Files in this repo are immutable once published. Re-conversions are published as new versions, never overwrites. ## License Apache 2.0, same as the original weights by Google. This repository redistributes a converted form of those weights without modification to their values beyond 8-bit palettization.