aufklarer's picture
Upload README.md with huggingface_hub
16cb41c verified
|
raw
history blame
2.19 kB
metadata
license: mit
tags:
  - speech-enhancement
  - noise-reduction
  - coreml
  - apple-neural-engine
  - deepfilternet
language:
  - en
  - multilingual
library_name: qwen3-asr-swift
pipeline_tag: audio-to-audio

DeepFilterNet3 - Core ML

Speech enhancement (noise removal) model converted to Core ML for Apple Neural Engine inference.

Based on DeepFilterNet3 (Interspeech 2023).

Model Details

Property Value
Parameters 2.1M
Model size 4.2 MB
Sample rate 48 kHz
Latency ~40ms (20ms frame + lookahead)
PESQ (DNS4) 3.17
Compute target Apple Neural Engine
Framework Core ML (mlprogram)
Min deployment macOS 14+ / iOS 17+

Architecture

Signal processing (STFT, ERB filterbank, deep filtering) runs on CPU via Accelerate/vDSP. Neural network inference runs on the Neural Engine via Core ML.

  • Encoder: 4x SepConv2d + SqueezedGRU (256-dim, 3 layers)
  • ERB Decoder: SqueezedGRU + skip convs + sigmoid mask (32 bands)
  • DF Decoder: SqueezedGRU + deep filter coefficients (96 bins x 5 taps)

Usage with qwen3-asr-swift

import SpeechEnhancement

let enhancer = try await SpeechEnhancer.fromPretrained()
let cleanAudio = try enhancer.enhance(audio: noisyAudio, sampleRate: 48000)

CLI:

audio denoise input.wav --output clean.wav

Performance

Metric Value
RTF (M2 Max) 0.34 (3x real-time)
20s audio ~7s processing

Files

  • DeepFilterNet3.mlpackage/ - Core ML model (Neural Engine)
  • auxiliary.npz - Signal processing data (ERB filterbank, Vorbis window, normalization states)

Conversion

Converted from PyTorch checkpoint using scripts/convert_deepfilternet3.py in qwen3-asr-swift.

License

MIT (following DeepFilterNet3 original license)

Citation

@inproceedings{schroeter2023deepfilternet3,
  title={DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement},
  author={Schroeter, Hendrik and Maier, Andreas and Escalante-B, Alberto N and Rosenkranz, Tobias},
  booktitle={Interspeech},
  year={2023}
}