Upload README.md with huggingface_hub

16cb41c verified 3 months ago

2.19 kB

license: mit
tags:
  - speech-enhancement
  - noise-reduction
  - coreml
  - apple-neural-engine
  - deepfilternet
language:
  - en
  - multilingual
library_name: qwen3-asr-swift
pipeline_tag: audio-to-audio

DeepFilterNet3 - Core ML

Speech enhancement (noise removal) model converted to Core ML for Apple Neural Engine inference.

Based on DeepFilterNet3 (Interspeech 2023).

Model Details

Property	Value
Parameters	2.1M
Model size	4.2 MB
Sample rate	48 kHz
Latency	~40ms (20ms frame + lookahead)
PESQ (DNS4)	3.17
Compute target	Apple Neural Engine
Framework	Core ML (mlprogram)
Min deployment	macOS 14+ / iOS 17+

Architecture

Signal processing (STFT, ERB filterbank, deep filtering) runs on CPU via Accelerate/vDSP. Neural network inference runs on the Neural Engine via Core ML.

Encoder: 4x SepConv2d + SqueezedGRU (256-dim, 3 layers)
ERB Decoder: SqueezedGRU + skip convs + sigmoid mask (32 bands)
DF Decoder: SqueezedGRU + deep filter coefficients (96 bins x 5 taps)

Usage with qwen3-asr-swift

import SpeechEnhancement

let enhancer = try await SpeechEnhancer.fromPretrained()
let cleanAudio = try enhancer.enhance(audio: noisyAudio, sampleRate: 48000)

CLI:

audio denoise input.wav --output clean.wav

Performance

Metric	Value
RTF (M2 Max)	0.34 (3x real-time)
20s audio	~7s processing

Files

DeepFilterNet3.mlpackage/ - Core ML model (Neural Engine)
auxiliary.npz - Signal processing data (ERB filterbank, Vorbis window, normalization states)

Conversion

Converted from PyTorch checkpoint using scripts/convert_deepfilternet3.py in qwen3-asr-swift.

License

MIT (following DeepFilterNet3 original license)

Citation

@inproceedings{schroeter2023deepfilternet3,
  title={DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement},
  author={Schroeter, Hendrik and Maier, Andreas and Escalante-B, Alberto N and Rosenkranz, Tobias},
  booktitle={Interspeech},
  year={2023}
}