Instructions to use vanch007/mlx-indextts2-standard-fp16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use vanch007/mlx-indextts2-standard-fp16 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir mlx-indextts2-standard-fp16 vanch007/mlx-indextts2-standard-fp16
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
File size: 3,688 Bytes
31118db | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | ---
library_name: mlx
pipeline_tag: text-to-speech
tags:
- indextts2
- mlx-indextts
- voice-cloning
- fp16
- zh
- en
- text-to-speech
- apple-silicon
- mlx
license: mit
---
# mlx-indextts2-standard-fp16
This is a converted MLX IndexTTS2 model for Apple Silicon inference with [`solar2ain/mlx-indextts`](https://github.com/solar2ain/mlx-indextts).
It was prepared for the local `/Users/vanch/index-tts` IndexTTS2 optimization project, where the goal was stable Vietnamese and multilingual TTS on an M3 Max Mac without PyTorch MPS memory crashes.
## Variant
- Profile: **Standard multilingual**
- Precision / quantization: **fp16**
- Approx local size: **2.0GB**
- Source checkpoint directory during conversion: `/Users/vanch/index-tts/checkpoints`
- Note: All floating MLX weights cast to fp16 from the standard fp32 conversion.
- Conversion detail: Derived locally by casting floating MLX safetensors to `float16`; this is not an upstream CLI quantization mode.
## Expected Files
The repository root is a ready-to-use MLX IndexTTS2 model directory:
- `gpt.safetensors`
- `s2mel.safetensors`
- `bigvgan.safetensors`
- `vq2emb.safetensors`
- `tokenizer.model`
- `config.yaml`
- `config.json`
- `feat1.pt`
- `feat2.pt`
- `wav2vec2bert_stats.pt`
## Usage
Install and use `mlx-indextts`:
```bash
git clone https://github.com/solar2ain/mlx-indextts.git
cd mlx-indextts
uv sync --extra convert --extra v2
huggingface-cli download vanch007/mlx-indextts2-standard-fp16 \
--local-dir models/mlx-indextts2-standard-fp16 \
--local-dir-use-symlinks False
uv run mlx-indextts generate \
-m models/mlx-indextts2-standard-fp16 \
-r /path/to/reference_or_speaker.npz \
-t "Your text here" \
-o output.wav \
--memory-limit 24 \
--diffusion-steps 16
```
For repeated generation, precompute speaker conditioning first:
```bash
uv run mlx-indextts speaker \
-m models/mlx-indextts2-standard-fp16 \
-r /path/to/reference.wav \
-o speaker.npz \
--memory-limit 24
```
## Benchmark
Benchmarked on a 128GB unified-memory M3 Max Mac using:
- `mlx-indextts` from `solar2ain/mlx-indextts`
- precomputed `.npz` speaker conditioning
- `memory_limit=24GB`
- `diffusion_steps=16`
- emotion=`calm`, `emo_alpha=0.6`
- same text set across fp32 / fp16 / 8bit / optimized PyTorch MPS
RTF lower is faster:
| Case | fp32 MLX RTF | fp16 MLX RTF | 8bit MLX RTF | PyTorch MPS RTF |
|---|---:|---:|---:|---:|
| zh short | 1.127 | 1.538 | 0.966 | 1.446 |
| zh long | 1.232 | 1.584 | 1.035 | 1.699 |
| en short | 1.157 | 1.462 | 0.914 | 2.192 |
| en long | 1.193 | 1.511 | 0.956 | 1.783 |
Summary from the local comparison:
- 8bit was the fastest MLX route in this test set.
- fp16 saved space but was slower than fp32 for the standard profile.
- Vietnamese fp16 was slightly faster than Vietnamese fp32, but Vietnamese 8bit was fastest.
## ASR Validation
ASR validation with local `mlx_whisper` + `whisper-large-v3-turbo` found no empty audio, wrong-language output, or obvious missing sentences. Chinese long-form ASR showed a minor `她/他` homophone difference; English long-form 8-bit ASR showed a minor tense difference.
ASR was used only as an automated sanity check. Final production selection should still include human listening, especially for long-form Vietnamese narration.
## Provenance and Scope
This is an MLX conversion for local Apple Silicon inference, not the original PyTorch release. The original implementation and model family are associated with IndexTTS / IndexTTS2; the MLX runtime used here is `solar2ain/mlx-indextts`.
The benchmark numbers are environment-specific and should be treated as local M3 Max results, not universal performance guarantees.
|