File size: 2,584 Bytes
16cb41c
a9ec0b6
16cb41c
1af4db9
 
 
 
 
 
 
 
 
 
16cb41c
 
1af4db9
16cb41c
1af4db9
 
16cb41c
1af4db9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16cb41c
a9ec0b6
16cb41c
a9ec0b6
 
1af4db9
 
 
 
 
 
 
 
 
 
16cb41c
a9ec0b6
16cb41c
1af4db9
 
 
 
 
 
 
 
16cb41c
 
 
 
a9ec0b6
16cb41c
 
1af4db9
 
16cb41c
a9ec0b6
16cb41c
 
1af4db9
16cb41c
1af4db9
16cb41c
1af4db9
16cb41c
1af4db9
 
08c3d22
1af4db9
08c3d22
1af4db9
 
 
 
 
 
08c3d22
1af4db9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
license: apache-2.0
tags:
- speech-enhancement
- denoising
- coreml
- apple-silicon
- deepfilternet
- int8
- palettization
base_model: Rikorose/DeepFilterNet3
library_name: coreml
pipeline_tag: audio-to-audio
---

# DeepFilterNet3 — CoreML INT8

Real-time speech enhancement for Apple Silicon. Removes background noise
from speech audio. Runs on **Neural Engine** via CoreML.

- **2.1M params**, INT8 k-means palettization, **2.2 MB**
- 48 kHz native, 10 ms frames
- Requires macOS 14+ / iOS 17+

## Quality

Measured on 30 VoiceBank-DEMAND test clips via Python `CoreMLBackend`
(replaces only the NN forward; keeps the PyTorch STFT / ERB / deep-filter
post-processing intact).

| Variant | PESQ | STOI | SI-SDR | Size |
|---------|------|------|--------|------|
| PyTorch FP32 (reference) | 2.900 | 0.947 | 18.19 | — |
| CoreML FP16 | 2.901 | 0.947 | 18.19 | 4.2 MB |
| **CoreML INT8 (this repo)** | **2.907** | **0.947** | **18.11** | **2.2 MB** |

INT8 matches FP16 within run-to-run noise (ΔPESQ +0.006, ΔSI-SDR
−0.07 dB, STOI identical) while cutting size by 48%.

## Latency (M2 Max)

| Duration | Time | RTF |
|----------|------|-----|
| 5 s | 0.65 s | 0.13 |
| 10 s | 1.2 s | 0.12 |
| 20 s | 4.8 s | 0.24 |

## Files

| File | Size | Description |
|------|------|-------------|
| `DeepFilterNet3.mlmodelc` | 2.2 MB | Pre-compiled CoreML model (runs on Neural Engine) |
| `auxiliary.npz` | 126 KB | ERB filterbank, Vorbis window, normalization states |

## Usage

Add [speech-swift](https://github.com/soniqo/speech-swift) to `Package.swift`:

```swift
.package(url: "https://github.com/soniqo/speech-swift", branch: "main")
```

Then denoise:

```swift
import SpeechEnhancement

let enhancer = try await SpeechEnhancer.fromPretrained()
let clean = try enhancer.enhance(audio: noisyAudio, sampleRate: 48000)
```

CLI:

```bash
swift run audio denoise noisy.wav --output clean.wav
```

## Source

- Base model: [Rikorose/DeepFilterNet3](https://github.com/Rikorose/DeepFilterNet) (Apache-2.0)

## License

- Model weights: Apache-2.0 / MIT dual license
- CoreML conversion: Apache-2.0

## Links

- [speech-swift](https://github.com/soniqo/speech-swift) — Apple SDK
- [soniqo.audio](https://soniqo.audio) — website
- [MLX vs CoreML on Apple Silicon — a practical guide](https://blog.ivan.digital/mlx-vs-coreml-on-apple-silicon-a-practical-guide-to-picking-the-right-backend-and-why-you-should-f77ddea7b27a) — related blog post
- [soniqo.audio/blog](https://soniqo.audio/blog) — blog

## Reference

- [DeepFilterNet3 paper](https://arxiv.org/abs/2305.08227)