--- license: gemma tags: - coreai - sentence-similarity - feature-extraction - apple-silicon - on-device --- # EmbeddingGemma 300m — Core AI export [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m) as a single static Core AI graph: the full sentence-transformers pipeline (transformer → mean pooling → dense projection → L2 normalize) runs in-graph, so one call returns a normalized 768-d embedding. On-device semantic search / RAG for macOS 27 / iOS 27 beta. Runs out of the box with [CoreAIKit](https://github.com/john-rocky/coreai-kit)'s `TextEmbedder`: ```swift let embedder = try await TextEmbedder() // downloads this repo let doc = try await embedder.embed(document: "Tokyo is the capital of Japan.") let query = try await embedder.embed(query: "what is the capital of Japan") let score = TextEmbedder.cosineSimilarity(doc, query) ``` Retrieval prompt prefixes (`task: search result | query: ` / `title: none | text: `) are applied automatically by `TextEmbedder`. ## Bundle layout ``` model/ ├── embeddinggemma-300m_float32_static.aimodel ├── tokenizer/ (HF tokenizer files) └── reference.json (torch reference cosines used by the parity test) ``` ## Graph contract | | name | shape | dtype | |---|---|---|---| | input | `input_ids` | [1, 256] | int32 (pad id 0, mask 0 over padding) | | input | `attention_mask` | [1, 256] | int32 | | output | `embedding` | [1, 768] | fp32, L2-normalized | Precision: fp32. Cross-runtime parity vs the torch SentenceTransformer pipeline is exact to 6 decimal places (see reference.json). fp16 variants (full cast AND mixed-precision autocast) produce NaN embeddings on-device — Gemma3 activations overflow half precision — so fp32 is shipped; a smaller int8 variant is future work. ## License Gemma Terms of Use (see the upstream model card). Conversion script: this repo's sibling, based on apple/coreai-models' recipe patterns (BSD-3-Clause).