metadata
license: apache-2.0
tags:
- speech-enhancement
- denoising
- coreml
- apple-silicon
- deepfilternet
- int8
- palettization
base_model: Rikorose/DeepFilterNet3
library_name: coreml
pipeline_tag: audio-to-audio
DeepFilterNet3 — CoreML INT8
Real-time speech enhancement for Apple Silicon. Removes background noise from speech audio. Runs on Neural Engine via CoreML.
- 2.1M params, INT8 k-means palettization, 2.2 MB
- 48 kHz native, 10 ms frames
- Requires macOS 14+ / iOS 17+
Quality
Measured on 30 VoiceBank-DEMAND test clips via Python CoreMLBackend
(replaces only the NN forward; keeps the PyTorch STFT / ERB / deep-filter
post-processing intact).
| Variant | PESQ | STOI | SI-SDR | Size |
|---|---|---|---|---|
| PyTorch FP32 (reference) | 2.900 | 0.947 | 18.19 | — |
| CoreML FP16 | 2.901 | 0.947 | 18.19 | 4.2 MB |
| CoreML INT8 (this repo) | 2.907 | 0.947 | 18.11 | 2.2 MB |
INT8 matches FP16 within run-to-run noise (ΔPESQ +0.006, ΔSI-SDR −0.07 dB, STOI identical) while cutting size by 48%.
Latency (M2 Max)
| Duration | Time | RTF |
|---|---|---|
| 5 s | 0.65 s | 0.13 |
| 10 s | 1.2 s | 0.12 |
| 20 s | 4.8 s | 0.24 |
Files
| File | Size | Description |
|---|---|---|
DeepFilterNet3.mlmodelc |
2.2 MB | Pre-compiled CoreML model (runs on Neural Engine) |
auxiliary.npz |
126 KB | ERB filterbank, Vorbis window, normalization states |
Usage
Add speech-swift to Package.swift:
.package(url: "https://github.com/soniqo/speech-swift", branch: "main")
Then denoise:
import SpeechEnhancement
let enhancer = try await SpeechEnhancer.fromPretrained()
let clean = try enhancer.enhance(audio: noisyAudio, sampleRate: 48000)
CLI:
swift run audio denoise noisy.wav --output clean.wav
Source
- Base model: Rikorose/DeepFilterNet3 (Apache-2.0)
License
- Model weights: Apache-2.0 / MIT dual license
- CoreML conversion: Apache-2.0
Links
- speech-swift — Apple SDK
- soniqo.audio — website
- MLX vs CoreML on Apple Silicon — a practical guide — related blog post
- soniqo.audio/blog — blog