File size: 2,584 Bytes
16cb41c a9ec0b6 16cb41c 1af4db9 16cb41c 1af4db9 16cb41c 1af4db9 16cb41c 1af4db9 16cb41c a9ec0b6 16cb41c a9ec0b6 1af4db9 16cb41c a9ec0b6 16cb41c 1af4db9 16cb41c a9ec0b6 16cb41c 1af4db9 16cb41c a9ec0b6 16cb41c 1af4db9 16cb41c 1af4db9 16cb41c 1af4db9 16cb41c 1af4db9 08c3d22 1af4db9 08c3d22 1af4db9 08c3d22 1af4db9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | ---
license: apache-2.0
tags:
- speech-enhancement
- denoising
- coreml
- apple-silicon
- deepfilternet
- int8
- palettization
base_model: Rikorose/DeepFilterNet3
library_name: coreml
pipeline_tag: audio-to-audio
---
# DeepFilterNet3 — CoreML INT8
Real-time speech enhancement for Apple Silicon. Removes background noise
from speech audio. Runs on **Neural Engine** via CoreML.
- **2.1M params**, INT8 k-means palettization, **2.2 MB**
- 48 kHz native, 10 ms frames
- Requires macOS 14+ / iOS 17+
## Quality
Measured on 30 VoiceBank-DEMAND test clips via Python `CoreMLBackend`
(replaces only the NN forward; keeps the PyTorch STFT / ERB / deep-filter
post-processing intact).
| Variant | PESQ | STOI | SI-SDR | Size |
|---------|------|------|--------|------|
| PyTorch FP32 (reference) | 2.900 | 0.947 | 18.19 | — |
| CoreML FP16 | 2.901 | 0.947 | 18.19 | 4.2 MB |
| **CoreML INT8 (this repo)** | **2.907** | **0.947** | **18.11** | **2.2 MB** |
INT8 matches FP16 within run-to-run noise (ΔPESQ +0.006, ΔSI-SDR
−0.07 dB, STOI identical) while cutting size by 48%.
## Latency (M2 Max)
| Duration | Time | RTF |
|----------|------|-----|
| 5 s | 0.65 s | 0.13 |
| 10 s | 1.2 s | 0.12 |
| 20 s | 4.8 s | 0.24 |
## Files
| File | Size | Description |
|------|------|-------------|
| `DeepFilterNet3.mlmodelc` | 2.2 MB | Pre-compiled CoreML model (runs on Neural Engine) |
| `auxiliary.npz` | 126 KB | ERB filterbank, Vorbis window, normalization states |
## Usage
Add [speech-swift](https://github.com/soniqo/speech-swift) to `Package.swift`:
```swift
.package(url: "https://github.com/soniqo/speech-swift", branch: "main")
```
Then denoise:
```swift
import SpeechEnhancement
let enhancer = try await SpeechEnhancer.fromPretrained()
let clean = try enhancer.enhance(audio: noisyAudio, sampleRate: 48000)
```
CLI:
```bash
swift run audio denoise noisy.wav --output clean.wav
```
## Source
- Base model: [Rikorose/DeepFilterNet3](https://github.com/Rikorose/DeepFilterNet) (Apache-2.0)
## License
- Model weights: Apache-2.0 / MIT dual license
- CoreML conversion: Apache-2.0
## Links
- [speech-swift](https://github.com/soniqo/speech-swift) — Apple SDK
- [soniqo.audio](https://soniqo.audio) — website
- [MLX vs CoreML on Apple Silicon — a practical guide](https://blog.ivan.digital/mlx-vs-coreml-on-apple-silicon-a-practical-guide-to-picking-the-right-backend-and-why-you-should-f77ddea7b27a) — related blog post
- [soniqo.audio/blog](https://soniqo.audio/blog) — blog
## Reference
- [DeepFilterNet3 paper](https://arxiv.org/abs/2305.08227)
|