metadata
license: mit
tags:
- speech-enhancement
- noise-reduction
- coreml
- apple-neural-engine
- deepfilternet
language:
- en
- multilingual
library_name: qwen3-asr-swift
pipeline_tag: audio-to-audio
DeepFilterNet3 - Core ML
Speech enhancement (noise removal) model converted to Core ML for Apple Neural Engine inference.
Based on DeepFilterNet3 (Interspeech 2023).
Model Details
| Property | Value |
|---|---|
| Parameters | 2.1M |
| Model size | 4.2 MB |
| Sample rate | 48 kHz |
| Latency | ~40ms (20ms frame + lookahead) |
| PESQ (DNS4) | 3.17 |
| Compute target | Apple Neural Engine |
| Framework | Core ML (mlprogram) |
| Min deployment | macOS 14+ / iOS 17+ |
Architecture
Signal processing (STFT, ERB filterbank, deep filtering) runs on CPU via Accelerate/vDSP. Neural network inference runs on the Neural Engine via Core ML.
- Encoder: 4x SepConv2d + SqueezedGRU (256-dim, 3 layers)
- ERB Decoder: SqueezedGRU + skip convs + sigmoid mask (32 bands)
- DF Decoder: SqueezedGRU + deep filter coefficients (96 bins x 5 taps)
Usage with qwen3-asr-swift
import SpeechEnhancement
let enhancer = try await SpeechEnhancer.fromPretrained()
let cleanAudio = try enhancer.enhance(audio: noisyAudio, sampleRate: 48000)
CLI:
audio denoise input.wav --output clean.wav
Performance
| Metric | Value |
|---|---|
| RTF (M2 Max) | 0.34 (3x real-time) |
| 20s audio | ~7s processing |
Files
DeepFilterNet3.mlpackage/- Core ML model (Neural Engine)auxiliary.npz- Signal processing data (ERB filterbank, Vorbis window, normalization states)
Conversion
Converted from PyTorch checkpoint using scripts/convert_deepfilternet3.py in qwen3-asr-swift.
License
MIT (following DeepFilterNet3 original license)
Citation
@inproceedings{schroeter2023deepfilternet3,
title={DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement},
author={Schroeter, Hendrik and Maier, Andreas and Escalante-B, Alberto N and Rosenkranz, Tobias},
booktitle={Interspeech},
year={2023}
}