mlboydaisuke
/

gpt-oss-20b-CoreAI-official

+---
+license: apache-2.0
+base_model: openai/gpt-oss-20b
+tags:
+- coreai
+- apple
+- aimodel
+- apple-silicon
+---
+# gpt-oss-20B (MoE) — official Apple Core AI export
+Pre-converted **`.aimodel` bundles from Apple's official
+[coreai-models](https://github.com/apple/coreai-models) export recipe — unmodified**,
+with the exact environment, hashes, and measured performance published.
+```bash
+uv run coreai.llm.export openai/gpt-oss-20b    # MXFP4 passes through, ~3 min convert
+```
+## Why pre-converted bundles?
+1. **The conversion needs a big-RAM Mac** (the 20B export was done on 128 GB);
+   running only needs enough RAM to mmap the artifact.
+2. **An `.aimodel` is a build artifact, not a pure function of the recipe** — the
+   same export command produced a 2.2× slower artifact across the macOS 26 → 27β
+   boundary ([forensics](https://github.com/john-rocky/apple-silicon-llm-bench/blob/main/methodology/coreai-export-lowering.md)).
+   Hosted artifacts + hashes are the reproducible ground truth; every bundle here
+   is exactly the one measured in
+   [apple-silicon-llm-bench](https://github.com/john-rocky/apple-silicon-llm-bench).
+## Bundles & integrity
+| Bundle | Contents | SHA-256 (`main.mlirb`) |
+|---|---|---|
+| `macos/` | macOS dynamic, MXFP4 (as shipped by OpenAI) | `63fb96f521f2579efb4d38013037431852bc4f07f471c8b78997ced3d20a230c` |
+## Measured (Apple's official `llm-benchmark`, greedy)
+| Bundle | Protocol | Decode tok/s | Prefill | Load (warm) | Peak RSS |
+|---|---|---:|---:|---:|---:|
+| macos | M4 Max, 512p/1024g | 78.1 | 1,252 | 2.1 s | 33.9 GB |
+`COREAI_CHUNK_THRESHOLD` is a prefill memory dial on this MoE: unchunked 4096-token
+prefill = 1,439 tok/s at 18 GB dirty footprint; chunk-128 = 766 tok/s at 1.7 GB.
+## Export environment
+- macOS 27.0 beta (build 26A5353q) · Xcode 27.0 (27A5194q)
+- `coreai-core 1.0.0b1` · `coreai-torch 0.4.0` · `coreai-opt 0.2.0` · `torch 2.9.0`
+- apple/coreai-models @ `b1cb71b` (export code identical to upstream `0c1055f`)
+## Run it
+```bash
+# CLI (from a coreai-models checkout)
+swift run -c release llm-runner --model <downloaded-bundle-dir> --prompt "Hello"
+swift run -c release llm-benchmark --model <downloaded-bundle-dir>
+```
+Or chat with it in [CoreAIChatMac](https://github.com/john-rocky/coreai-samples)
+(point "Choose Models Folder…" at the download directory).
+iOS static bundles must be AOT-compiled before device use:
+`xcrun coreai-build compile <ir>.aimodel --platform iOS --preferred-compute neural-engine --architecture h18p`
+(h18p = iPhone 17 Pro), then set `metadata.json` `assets.main` to the `.aimodelc`.
+---
+Maintained alongside [coreai-model-zoo](https://github.com/john-rocky/coreai-model-zoo)
+(community models) and [coreai-samples](https://github.com/john-rocky/coreai-samples) (apps).