mlboydaisuke commited on
Commit
4a732c9
·
verified ·
1 Parent(s): 12f4773

Kokoro-82M -> Core AI: predictor/prosody/vocoder bundles + 28 EN voices

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ kokoro_predictor.aimodel/main.mlirb filter=lfs diff=lfs merge=lfs -text
37
+ kokoro_prosody.aimodel/main.mlirb filter=lfs diff=lfs merge=lfs -text
38
+ kokoro_vocoder.aimodel/main.mlirb filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: coreai
6
+ pipeline_tag: text-to-speech
7
+ tags:
8
+ - text-to-speech
9
+ - tts
10
+ - core-ai
11
+ - coreml
12
+ - on-device
13
+ - styletts2
14
+ - kokoro
15
+ base_model: hexgrad/Kokoro-82M
16
+ ---
17
+
18
+ # Kokoro-82M — Core AI
19
+
20
+ [`hexgrad/Kokoro-82M`](https://huggingface.co/hexgrad/Kokoro-82M) (Apache-2.0), a tiny
21
+ high-quality **StyleTTS2 + iSTFTNet** text-to-speech model (82M params, 24 kHz),
22
+ converted to Apple **Core AI** (`.aimodel`, iOS 27 / macOS 27) — the
23
+ [CoreAI-Model-Zoo](https://github.com/john-rocky/coreai-model-zoo)'s first TTS.
24
+
25
+ Non-autoregressive: phonemes + a voice/style vector → a waveform in one pass.
26
+ Runs fully on-device, English-first, with grapheme→phoneme on the host.
27
+
28
+ ## Bundles
29
+
30
+ The acoustic graph has one data-dependent length (the duration→alignment expansion),
31
+ so it is cut into **three voice-independent `.aimodel` bundles** with two cheap host
32
+ steps between them:
33
+
34
+ | file | in → out |
35
+ |---|---|
36
+ | `kokoro_predictor.aimodel` | `input_ids[1,128]` i32, `ref_s[1,256]`, `attn_mask[1,128]` → `duration`, `d`, `t_en` |
37
+ | `kokoro_prosody.aimodel` | `d`, `t_en`, `aln[1,128,512]`, `ref_s`, `frame_mask[1,512]` → `asr`, `F0`, `N` |
38
+ | `kokoro_vocoder.aimodel` | `asr`, `F0`, `N`, `har`, `ref_s`, `frame_mask` → `audio[1, L·600]` |
39
+
40
+ `voices/*.pt` — the **28 English voice packs** (Apache-2.0). The voice is the `ref_s`
41
+ input: `ref_s = pack[len(ids)−1]`. Quality leaders: `af_heart`, `af_bella`,
42
+ `af_nicole`, `bf_emma`.
43
+
44
+ Token length **T** and frame length **L** are fixed **buckets** (128 / 512); the host
45
+ left-pads to the bucket and trims the output. Longer text is split into sentences
46
+ host-side. Run on the Core AI **CPU** compute unit. ~0.75 s / utterance on M4 Max,
47
+ ~335 MB total (fp32).
48
+
49
+ ## Host steps
50
+
51
+ ```
52
+ text ──(misaki G2P)──▶ ids ──▶ predictor ──▶ [build alignment] ──▶ prosody
53
+ ──▶ [har = STFT(SineGen(f0_upsamp(F0)))] ──▶ vocoder ──▶ [trim] ──▶ 24 kHz audio
54
+ ```
55
+
56
+ G2P is [misaki](https://github.com/hexgrad/misaki) (`misaki[en]`, no espeak for
57
+ English); on-device [MisakiSwift](https://github.com/mlalma/MisakiSwift) gives the same
58
+ English phonemes. `har` (the hn-nsf source's STFT) is a windowed FFT computed on the
59
+ host — the one piece that must stay off the engine (its `atan2` phase flips 2π at the
60
+ F0→0 pad boundary under fp32).
61
+
62
+ ## Quality
63
+
64
+ The hn-nsf source phase is arbitrary (stock Kokoro randomizes it), so the gate is
65
+ spectral: **magnitude-spectrogram correlation 0.999** vs the PyTorch reference
66
+ (`af_heart`, multiple sentences). Raw waveform correlation ~0.98 — the bounded,
67
+ inaudible effect of the bucket pad boundary.
68
+
69
+ ## Convert / re-bucket
70
+
71
+ [`conversion/export_kokoro.py`](https://github.com/john-rocky/coreai-model-zoo/blob/main/conversion/export_kokoro.py)
72
+ (`python export_kokoro.py --out-dir out`; `--verify` runs the engine-vs-torch spectral
73
+ gate; `--token-bucket` / `--frame-bucket` to re-size). Card + the full port write-up:
74
+ [`zoo/kokoro-82m.md`](https://github.com/john-rocky/coreai-model-zoo/blob/main/zoo/kokoro-82m.md).
75
+
76
+ ## License
77
+
78
+ Apache-2.0 (model weights and the 28 English voices). The Core AI export code derives
79
+ from Apple's BSD-3-Clause `coreai_models`.
kokoro_predictor.aimodel/main.hash ADDED
@@ -0,0 +1 @@
 
 
1
+ ��3�U��|e��1B�j�V�<��cju�6�
kokoro_predictor.aimodel/main.mlirb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a1ec8433cc125519cc051bc17c65adfb3142b36add56fa3c85ab636a759f36fc
3
+ size 83283367
kokoro_predictor.aimodel/metadata.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "description" : "Kokoro-82M TTS (StyleTTS2 + iSTFTNet). https:\/\/huggingface.co\/hexgrad\/Kokoro-82M",
3
+ "assetVersion" : "2.0",
4
+ "author" : "hexgrad (Kokoro-82M); Core AI export: coreai-model-zoo",
5
+ "license" : "Apache-2.0"
6
+ }
kokoro_prosody.aimodel/main.hash ADDED
@@ -0,0 +1 @@
 
 
1
+ ��f�;����~���m{lA�LjH%�|}Lc#��V�
kokoro_prosody.aimodel/main.mlirb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:82d466933bd4fe87cc7e80f8a56d7b6c41adc7884825ed7c7d4c6323cdc756f0
3
+ size 38183716
kokoro_prosody.aimodel/metadata.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "description" : "Kokoro-82M TTS (StyleTTS2 + iSTFTNet). https:\/\/huggingface.co\/hexgrad\/Kokoro-82M",
3
+ "assetVersion" : "2.0",
4
+ "author" : "hexgrad (Kokoro-82M); Core AI export: coreai-model-zoo",
5
+ "license" : "Apache-2.0"
6
+ }
kokoro_vocoder.aimodel/main.hash ADDED
@@ -0,0 +1 @@
 
 
1
+ ��Bq+���v�^�K�D��[m�j�Z�h�
kokoro_vocoder.aimodel/main.mlirb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8a92154211712b93efdf7608b25e1cf14b9c4483e85b6d7fd76ac0185aff68c8
3
+ size 213885093
kokoro_vocoder.aimodel/metadata.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "description" : "Kokoro-82M TTS (StyleTTS2 + iSTFTNet). https:\/\/huggingface.co\/hexgrad\/Kokoro-82M",
3
+ "assetVersion" : "2.0",
4
+ "author" : "hexgrad (Kokoro-82M); Core AI export: coreai-model-zoo",
5
+ "license" : "Apache-2.0"
6
+ }
voices/af_alloy.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6d877149dd8b348fbad12e5845b7e43d975390e9f3b68a811d1d86168bef5aa3
3
+ size 523425
voices/af_aoede.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c03bd1a4c3716c2d8eaa3d50022f62d5c31cfbd6e15933a00b17fefe13841cc4
3
+ size 523425
voices/af_bella.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8cb64e02fcc8de0327a8e13817e49c76c945ecf0052ceac97d3081480e8e48d6
3
+ size 523425
voices/af_heart.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ab5709b8ffab19bfd849cd11d98f75b60af7733253ad0d67b12382a102cb4ff
3
+ size 523425
voices/af_jessica.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cdfdccb8cc975aa34ee6b89642963b0064237675de0e41a30ae64cc958dd4e87
3
+ size 523435
voices/af_kore.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8bfbc512321c3db49dff984ac675fa5ac7eaed5a96cc31104d3a9080e179d69d
3
+ size 523420
voices/af_nicole.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c5561808bcf5250fe8c5f5de32caf2d94f27e57e95befdb098c5c85991d4c5da
3
+ size 523430
voices/af_nova.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e0233676ddc21908c37a1f102f6b88a59e4e5c1bd764983616eb9eda629dbcd2
3
+ size 523420
voices/af_river.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e149459bd9c084416b74756b9bd3418256a8b839088abb07d463730c369dab8f
3
+ size 523425
voices/af_sarah.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49bd364ea3be9eb3e9685e8f9a15448c4883112a7c0ff7ab139fa4088b08cef9
3
+ size 523425
voices/af_sky.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c799548aed06e0cb0d655a85a01b48e7f10484d71663f9a3045a5b9362e8512c
3
+ size 523351
voices/am_adam.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ced7e284aba12472891be1da3ab34db84cc05cc02b5889535796dbf2d8b0cb34
3
+ size 523420
voices/am_echo.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8bcfdc852bc985fb45c396c561e571ffb9183930071f962f1b50df5c97b161e8
3
+ size 523420
voices/am_eric.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ada66f0eefff34ec921b1d7474d7ac8bec00cd863c170f1c534916e9b8212aae
3
+ size 523420
voices/am_fenrir.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:98e507eca1db08230ae3b6232d59c10aec9630022d19accac4f5d12fcec3c37a
3
+ size 523430
voices/am_liam.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c82550757ddb31308b97f30040dda8c2d609a9e2de6135848d0a948368138518
3
+ size 523420
voices/am_michael.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a443b79a4b22489a5b0ab7c651a0bcd1a30bef675c28333f06971abbd47bd37
3
+ size 523435
voices/am_onyx.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8452be16cd0f6da7b4579eaf7b1e4506e92524882053d86d72b96b9a7fed584
3
+ size 523420
voices/am_puck.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd1d8973f4ce4b7d8ae407c77a435f485dabc052081b80ea75c4f30b84f36223
3
+ size 523420
voices/am_santa.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f2f7582fa2b1f160e90aafe6d0b442a685e773608b6667e545d743b073e97a7
3
+ size 523425
voices/bf_alice.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d292651b6af6c0d81705c2580dcb4463fccc0ff7b8d618a471dbb4e45655b3f3
3
+ size 523425
voices/bf_emma.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0a423deabf4a52b4f49318c51742c54e21bb89bbbe9a12141e7758ddb5da701
3
+ size 523420
voices/bf_isabella.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cdd4c37003805104d1d08fb1e05855c8fb2c68de24ca6e71f264a30aaa59eefd
3
+ size 523440
voices/bf_lily.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e09c2e481e2d53004d7e5ae7d3a325369e130a6f45c35a6002de75084be9285
3
+ size 523420
voices/bm_daniel.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc3fce4e9c12ed4dbc8fa9680cfe51ee190a96444ce7c3ad647549a30823fc5d
3
+ size 523430
voices/bm_fable.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d44935f3135257a9064df99f007fc1342ff1aa767552b4a4fa4c3b2e6e59079c
3
+ size 523425
voices/bm_george.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f1bc812213dc59774769e5c80004b13eeb79bd78130b11b2d7f934542dab811b
3
+ size 523430
voices/bm_lewis.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b5204750dcba01029d2ac9cec17aec3b20a6d64073c579d694a23cb40effbd0e
3
+ size 523425